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Abstract 

Alongside the development of quantum algorithms and quantum complexity theory in recent 
years, quantum techniques have also proved instrumental in obtaining results in diverse clas- 
sical (non-quantum) areas, such as coding theory, communication complexity, and polynomial 
approximations. In this paper we survey these results and the quantum toolbox they use. 
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6 Conclusion 

A The most general quantum model 



1 Introduction 

1.1 Surprising proof methods 

Mathematics is full of surprising proofs, and these form a large part of the beauty and fascination 
of the subject to its practitioners. A feature of many such proofs is that they introduce objects or 
concepts from beyond the "milieu" in which the problem was originally posed. 

As an example from high-school math, the easiest way to prove real-valued identities like 

cos(x + y) = cos x cos y — sin x sin y 

is to go to complex numbers: using the identity e lx = cosx + isinx we have 

e «(z+y) — e lx e i y = (cosx + i sin x) (cosy + i sin y) = cosxcosy — sin x sin y + i (cosx sin y + sin x cosy) . 

Taking the real parts of the two sides gives our identity. 

Another example is the probabilistic method, associated with Paul Erdos and excellently covered 
in the book of Alon and Spencer [13J. The idea here is to prove the existence of an object with a 
specific desirable property P by choosing such an object at random, and showing that it satisfies P 
with positive probability. Here is a simple example: suppose we want to prove that every undirected 
graph G = (V, E) with \E\ = m edges has a cut (a partition V = V\ U V2 of its vertex set) with at 
least m/2 edges crossing the cut. 

Proof. Choose the cut at random, by including each vertex i in V\ with probability 
1/2 (independently of the other vertices). For each fixed edge (i, j), the probability that 
it crosses is the probability that i and j end up in different sets, which is exactly 1/2. 
Hence by linearity of expectation, the expected number of crossing edges for our cut is 
exactly m/2. But then there must exist a specific cut with at least m/2 crossing edges. 
□ 

The statement of the theorem has nothing to do with probability, yet probabilistic methods allow us 
to give a very simple proof. Alon and Spencer [13] give many other examples of this phenomenon, 
in areas ranging from graph theory and analysis to combinatorics and computer science. 

Two special cases of the probabilistic method deserve mention here. First, one can combine the 
language of probability with that of information theory [30]. For instance, if a random variable X 
is uniformly distributed over some finite set S then its Shannon entropy H(X) = — Pr[A = 
x]logPr[A = x] is exactly log|5|. Hence upper (resp. lower) bounds on this entropy give upper 
(resp. lower) bounds on the size of S. Information theory offers many tools that allow us to 
manipulate and bound entropies in sophisticated yet intuitive ways. The following example is 
due to Peter Frankl. In theoretical computer science one often has to bound sums of binomials 
coefficients like 
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say for some a < 1/2. This s is exactly the size of the set S C {0, 1}™ of strings of Hamming weight 
at most an. Choose X = (X\, . . . ,X n ) uniformly at random from S. Then, individually, each Xj 
is a bit whose probability of being 1 is at most a, and hence H{Xi) < H{a) = —a log a — (1 — 
a)log(l — a). Using the sub-additivity of entropy we obtain an essentially tight upper bound on 
the size of S: 

n 

logs = log|S| = H(X) < ^H{Xi) < nH{a). 

i=l 

A second, related but more algorithmic approach is the so-called "incompressibility method," which 
reasons about the properties of randomly chosen objects and is based on the theory of Kolmogorov 
complexity |84^ Chapter 6]. In this method we consider "compression schemes," that is, injective 
mappings C from binary strings to other binary strings. The basic observation is that for any C 
and n, most strings of length n map to strings of length nearly n or more, simply because there 
aren't enough short descriptions to go round. Thus, if we can design some compression scheme 
that represents n-bit objects that do not have some desirable property P with much fewer than n 
bits, it follows that most n-bit strings have property P. 

Of course one can argue that applications of the probabilistic method are all just counting argu- 
ments disguised in the language of probability, and hence probabilistic arguments are not essential 
to the proof. In a narrow sense this is indeed correct. However, viewing things probabilistically 
gives a rather different perspective and allows us to use sophisticated tools to bound probabilities, 
such as large deviation inequalities and the Lovasz Local Lemma, as exemplified in [13J. While such 
tools may be viewed as elaborate ways of doing a counting argument, the point is that we might 
never think of using them if the argument were phrased in terms of counting instead of probability. 
Similarly, arguments based on information theory or incompressibility are essentially "just" count- 
ing arguments, but the information-theoretic and algorithmic perspective leads to proofs we would 
not easily discover otherwise. 

1.2 A quantum method? 

The purpose of this paper is to survey another family of surprising proofs that use the language 
and techniques of quantum computing to prove theorems whose statement has nothing to do with 
quantum computing. 

Since the mid-1990s, especially since Peter Shor's 1994 quantum algorithm for factoring large 
integers [117] . quantum computing has grown to become a prominent and promising area at the 
intersection of computer science and physics. Quantum computers could yield fundamental im- 
provements in algorithms, communication protocols, and cryptography. This promise, however, 
depends on physical realization, and despite the best efforts of experimental physicists we are still 
very far from building large-scale quantum computers. 

In contrast, using the language and tools of quantum computing as a proof tool is something 
we can do today. Here, quantum mechanics is purely a mathematical framework, and our proofs 
remain valid even if large-scale quantum computers are never built (or worse, if quantum mechanics 
turns out to be wrong as a description of reality). This paper describes a number of recent results 
of this type. As with the probabilistic method, these applications range over many areas, from 
error-correcting codes and complexity theory to purely mathematical questions about polynomial 
approximations and matrix theory. We hesitate to say that they represent a "quantum method," 
since the set of tools is far less developed than the probabilistic method. However, we feel that 
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these quantum tools will yield more surprises in the future, and have the potential to grow into a 
full-fledged proof method. 

As we will see below, the language of quantum computing is really just a shorthand for linear 
algebra: states are vectors and operations are matrices. Accordingly, one could argue that we don't 
need the quantum language at all. Indeed, one can always translate the proofs given below back 
to the language of linear algebra. What's more, there is already an extensive tradition of elegant 
proofs in combinatorics, geometry, and other areas, which employ linear algebra (often over finite 
fields) in surprising ways. For two surveys of this linear algebra method, see the books by Babai and 
Frank |18j and Jukna |66t Part III]. However, we feel the proofs we survey here are of a different 
nature than those produced by the classical linear algebra method. Just as thinking probabilistically 
suggests strategies that might not occur when taking the counting perspective, couching a problem 
in the language of quantum algorithms and quantum information gives us access to intuitions and 
tools that we would otherwise likely overlook or consider unnatural. While certainly not a cure-all, 
for some types of problems the quantum perspective is a very useful one and there is no reason to 
restrict oneself to the language of linear algebra. 

1.3 Outline 

The survey is organized as follows. We begin in Section [2] with a succinct introduction to the 
quantum model and the properties used in our applications. Most of those applications can be 
conveniently classified in two broad categories. First, there are applications that are close in spirit 
to the classical information-theory method. They use quantum information theory to bound the 
dimension of a quantum system, analogously to how classical information theory can be used to 
bound the size of a set. In Section [3] we give three results of this type. Other applications use 
quantum algorithms as a tool to define polynomials with desired properties. In Section U] we give a 
number of applications of this type. Finally, there are a number of applications of quantum tools 
that do not fit well in the previous two categories; some of these are classical results more indirectly 
"inspired" by earlier quantum results. These are described in Section [5j 

2 The quantum toolbox 

The goal of this survey is to show how quantum techniques can be used to analyze non-quantum 
questions. Of course, this requires at least some knowledge of quantum mechanics, which might 
appear discouraging to those without a physics background. However, the amount of quantum 
mechanics one needs is surprisingly small and easily explained in terms of basic linear algebra. 
The first thing we would like to convey is that at the basic level, quantum mechanics is not a full- 
fledged theory of the universe (containing claims about which objects and forces "really exist"), 
but rather a framework in which to describe physical systems and processes they undergo. Within 
this framework we can posit the existence of basic units of quantum information ("qubits") and 
ways of transforming them, just as classical theoretical computer science begins by positing the 
existence of bits and the ability to perform basic logical operations on them. While we hope this is 
reassuring, it is nevertheless true that the quantum-mechanical framework has strange and novel 
aspects — which, of course, is what makes it worth studying in the first place. 

In this section we give a bare-bones introduction to the essentials of quantum mechanics and 
quantum computing. (A more general framework for quantum mechanics is given in Appendix lAl 
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but we will not need it for the results we describe.) We then give some specific useful results from 
quantum information theory and quantum algorithms. 



2.1 The quantum model 

At a very general level, any physical system is associated with a Hilbert space, and a state of that 
system is described by an element of that Hilbert space. The Hilbert space corresponding to the 
combination of two physical systems is the tensor product of their respective Hilbert spaces. 



Pure states For our purposes, a pure quantum state (often just called a state) is a unit column 
vector in a (i-dimensional complex vector space C d . Quantum physics typically used the Dirac 
notation, writing a column vector v as \v), while (v\ denotes the row vector that is the conjugate 
transpose of v. 

The simplest nontrivial example is the case of a 2-dimensional system, called a qubit. We 
identify the two possible values of a classical bit with the two vectors in the standard orthonormal 
basis for this space: 

';)■ 



|0> 



In general, the state of a qubit can be a superposition (i.e., linear combination) of these two values: 



a |0) + ai|l) 



Oil 



where the complex numbers are called amplitudes; ao is the amplitude of basis state |0), and a.\ is 
the amplitude of |1). Since a state is a unit vector, we have |«o| 2 + l a i| 2 = 1- 

A 2-qubit space is obtained by taking the tensor product of two 1-qubit spaces. This is most 
easily explained by giving the four basis vectors of the tensor space: 



|00) = |0)®|0) 



|10) = |1>®|0> = 



( 1 \ 




V o / 



1 

V o / 



|01) = |0) <g> |1> 



|01) = |1>®|1> = 



1 



V o / 
/ o \ 




V i / 



These correspond to the four possible 2-bit strings. More generally, we can form 2 n -dimensional 
spaces this way whose basis states correspond to the 2 n different re-bit strings. 

We also sometimes use ci-dimensional spaces without such a qubit-structure. Here we usually 
denote the d standard orthonormal basis vectors with |1), . . . , \d), where \i)i = 1 and \i)j = for 
all j / i. For a vector \<f>) = X]f=i a «K) i n this space, (<p\ = X]f=i a i(^l ^ s the row vector that is 
the conjugate transpose of \<j>). The Dirac notation allows us for instance to conveniently write the 
standard inner product between states \<j>) and |Y>) as (<j>\ ■ = ((f)\tp). This inner product induces 
the Euclidean norm (or "length") of vectors: ||u|| = y/ (v\v). 
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One can also take tensor products in this space: if \<f>) = J2i<=[ m ] a «IO an d \ip) = J2je[ n ] 
then their tensor product \(f>) g) £ c m " i s 



\4>) ® l^> = ^2 a ^iK>-?)> 



ie[m],je[n] 



where [n] denotes the set {1, . . . ,n} and the vectors \i, j) = \i) (g) \ j) form an orthonormal basis for 
C mn . This tensor product of states \(j>) and \ip) is also often denoted simply as \(f))\^). Note that 
this new state is a unit vector, as it should be. 

Not every pure state in C mn can be expressed as a tensor product in this way; those that cannot 
are called entangled. The best-known entangled state is the 2-qubit EPR-pair (l/\/2) ( 1 00) + |11)), 
named after the authors of the paper [33]. When two separated parties each hold part of such an 
entangled state, we talk about shared entanglement between them. 

Transformations There are two things one can do with a quantum state: transform it or mea- 
sure it. Actually, as we will see, measurements can transform the measured states as well; however, 
we reserve the word "transformation" to describe non-measurement change processes, which we 
describe next. Quantum mechanics allows only linear transformations on states. Since these linear 
transformations must map unit vectors to unit vectors, we require them to be norm-preserving 
(equivalently, inner-product-preserving) . Norm-preserving linear maps are called unitary. Equiv- 
alently, these are the d x d matrices U whose conjugate transpose U* equals the inverse U^ 1 
(physicists typically write W instead of U*). For our purposes, unitary transformations are exactly 
the transformations that quantum mechanics allows us to apply to states. We will frequently define 
transformations by giving their action on the standard basis, with the understanding that such a 
definition extends (uniquely) to a linear map on the entire space. 

Possibly the most important 1-qubit unitary is the Hadamard transform: 



This maps basis state |0) to ^(|0) + |1)) and |1) to ^(|0) - |1)). 

Two other types of unitaries deserve special mention. First, for any function / : {0, l} n — > 
{0, l}" - , define a transformation Uf mapping the joint computational basis state \x) \y) (where x, y £ 
{0, l} n ) to \x)\y®f(x)), where "©" denotes bitwise addition ( mod 2) of n-bit vectors. Note that Uf 
is a permutation on the orthonormal basis states, and therefore unitary. With such transformations 
we can simulate classical computations. Next, fix a unitary transformation U on a /c-qubit system, 
and consider the {k + l)-qubit unitary transformation V defined by 



This V is called a controlled-U operation, and the first qubit is called the control qubit. Intuitively, 
our quantum computer uses the first qubit to "decide" whether or not to apply U to the last k 
qubits. 

Finally, just as one can take the tensor product of quantum states in different registers, one 
can take the tensor product of quantum operations (more generally, of matrices) acting on two 




(1) 



v(\o)m = \oM, miM) = |iW>- 



(2) 
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registers. If A = (a^) is an m x ml matrix and B is an n x n' matrix, then their tensor product is 
the ran x m'n' matrix 

/ a\\B ■ ■ ■ a\ m iB \ 
a,2\B • • • a,2 m 'B 

A(g>B = 

\ a m iB ■ ■ ■ a mm iB J 

Note that the tensor product of two vectors is the special case where m! = n' = 1, and that A B 
is unitary if A and B are. We may regard A B as the simultaneous application of A to the first 
register and B to the second register. For example, the n-fold tensor product H® n denotes the 
unitary that applies the one-qubit Hadamard gate to each qubit of an n-qubit register. This maps 
any basis state \x) to 

h*»\x) = ^= £ (-ir%> 

VZ ye{o,i} n 

(and vice versa, since H happens to be its own inverse). Here x • y = Yli=i x iVi denotes the inner 
product of bit strings. 



Measurement Quantum mechanics is distinctive for having measurement built-in as a funda- 
mental notion, at least in most formulations. A measurement is a way to obtain information about 
the measured quantum system. It takes as input a quantum state and outputs classical data (the 
"measurement outcome"), along with a new quantum state. It is an inherently probabilistic process 
that affects the state being measured. Various types of measurements on systems are possible. In 
the simplest kind, known as measurement in the computational basis, we measure a pure state 

d 

\4>) = J^ailt) 
i=i 

and see the basis state \i) with probability pi equal to the squared amplitude \oii\ 2 (or more accu- 
rately, the squared modulus of the amplitude — it is often convenient to just call this the squared 
amplitude). Since the state is a unit vector these outcome probabilities sum to 1, as they should. 
After the measurement, the state has changed to the observed basis state Note that if we apply 
the measurement now a second time, we will observe the same \i) with certainty — as if the first 
measurement forced the quantum state to "make up its mind." 

A more general type of measurement is the projective measurement, also known as Von Neumann 
measurement. A projective measurement with k outcomes is specified by d x d projector matrices 
Pi,...,Pk that form an orthogonal decomposition of the d-dimensional space. That is, P%Pj = 
SijPi, and Yli=i Pi = I i s the identity operator on the whole space. Equivalently, there exist 
orthonormal vectors vi,...,Vd and a partition SiU- • -USfc of {1, . . . , d} such that Pi = J2je,s t \ v j)( v j I 
for all i G [k]. With some abuse of notation we can identity Pi with the subspace onto which it 
projects, and write the orthogonal decomposition of the complete space as 

c d = Pi e p 2 e • • • e p k . 

Correspondingly, we can write \<f>) as the sum of its components in the k subspaces: 

\<j ) )=P 1 \<t>) + P 2 \<t>) + ... + P k \<i>). 
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A measurement probabilistically picks out one of these components: the probability of outcome i 
is ||Pj|0)|| 2 , and if we got outcome i then the state changes to the new unit vector Pj|^)/||Pj|</>) || 
(which is the component of \(p) in the i-th subspace, renormalized) . 

An important special case of projective measurements is measurement relative to the orthonor- 
mal basis {\vt)}, where each projector Pj projects onto a 1-dimensional subspace spanned by the 
unit vector In this case we have k = d and Pi = \vi){vi\. A measurement in the computational 
basis corresponds to the case where Pi = If \4>) = Y^i a iV) then we indeed recover the squared 

amplitude: pi = ||Pj|(/>)|| 2 = |a«| 2 . 

One can also apply a measurement to part of a state, for instance to the first register of a 
2-register quantum system. Formally, we just specify a fc-outcome projective measurement for the 
first register, and then tensor each of the k projectors with the identity operator on the second 
register to obtain a fc-outcome measurement on the joint space. 

Looking back at our definitions, we observe that if two quantum states \<f>), \ip) satisfy a\4>) = 
for some scalar a (necessarily of unit norm), then for any system of projectors {Pi}, ||Pj|0)|| 2 = 
llPilV')!! 2 and so measuring \(f>) with {Pi} yields the same distribution as measuring \tp). More 
is true: if we make any sequence of transformations and measurements to the two states, the 
sequence of measurement outcomes we see are identically distributed. Thus the two states are 
indistinguishable, and we generally regard them as the same state. 

Quantum-classical analogy For the uninitiated, these high-dimensional complex vectors and 
unitary transformations may seem baffling. One helpful point of view is the analogy with classical 
random processes. In the classical world, the evolution of a probabilistic automaton whose state 
consists of n bits can be modeled as a sequence of 2 n -dimensional vectors tt 1 ,n 2 , . . .. Each ir l is a 
probability distribution on {0, l} n , where 7r* gives the probability that the automaton is in state 
x if measured at time t (ir 1 is the starting state). The evolution from time t to t + 1 is describable 
by a matrix equation ir t+1 = Mtir 1 , where Mt is a 2 n x 2 n stochastic matrix, that is, a matrix 
that always maps probability vectors to probability vectors. The final outcome of the computation 
is obtained by sampling from the last probability distribution. The quantum case is similar: an 
n-qubit state is a 2 n -dimensional vector, but now it is a vector of complex numbers whose squares 
sum to 1. A transformation corresponds to a 2" x 2 n matrix, but now it is a matrix that preserves 
the sum of squares of the entries. Finally, a measurement in the computational basis obtains the 
final outcome by sampling from the distribution given by the squares of the entries of the vector. 

2.2 Quantum information and its limitations 

Quantum information theory studies the quantum generalizations of familiar notions from classi- 
cal information theory such as Shannon entropy, mutual information, channel capacities, etc. In 
Section [3] we give several examples where quantum information theory is used to say something 
about various non-quantum systems. The quantum information-theoretic results we need all have 
the same flavor: they say that a low-dimensional quantum state (i.e., a small number of qubits) 
cannot contain too much accessible information. 

Holevo's Theorem The mother of all such results is Holevo's theorem from 1973 [58], which 
predates the area of quantum computation by many years. Its proper technical statement is in terms 
of a quantum generalization of mutual information, but the following consequence of it (derived by 
Cleve et al. [39J) about two communicating parties, suffices for our purposes. 
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Theorem 2.1 (Holevo, CDNT) If Alice wants to send n bits of information to Bob via a quan- 
tum channel (i.e., by exchanging quantum systems), and they do not share an entangled state, then 
they have to exchange at least n qubits. If they are allowed to share unlimited prior entanglement, 
then Alice has to send at least n/2 qubits to Bob, no matter how many qubits Bob sends to Alice. 

This theorem is slightly imprecisely stated here, but the intuition is very clear: the first part of 
the theorem says that if we encode some classical random variable X in an m-qubit state^J then 
no measurement on the quantum state can give more than m bits of information about X. More 
precisely: the classical mutual information between X and the classical measurement outcome M 
on the m-qubit system, is at most m. If we encoded the classical information in a m-bit system 
instead of a m-qubit system this would be a trivial statement, but the proof of Holevo's theorem 
is quite non-trivial. Thus we see that a m-qubit state, despite somehow "containing" 2 m complex 
amplitudes, is no better than m classical bits for the purpose of storing information (this is in the 
absence of prior entanglement; if Alice and Bob do share entanglement, then m qubits are no better 
than 2m. classical bits). 

Low-dimensional encodings Here we provide a "poor man's version" of Holevo's theorem due 
to Nayak [96, Theorem 2.4.2], which has a simple proof and often suffices for applications. Suppose 
we have a classical random variable X, uniformly distributed over [N] = {1, . . . , N}. Let x i— > \(j) x ) 
be some encoding of [N], where \<p x ) is a pure state in a ci-dimensional space. Let Pi, ... , P/y be 
the measurement operators applied for decoding; these sum to the d-dimensional identity operator. 
Then the probability of correct decoding in case X = x, is 



In other words, if we are encoding one of N classical values in a d-dimensional quantum state, then 
any measurement to decode the encoded classical value has average success probability at most 
d/N (uniformly averaged over all N values that we can enco de)| This is optimal. For example, if 
we encode n bits into m qubits, we will have N = 2 n , d = 2 m , and the average success probability 
of decoding is at most 2 m /2 n . 

Random access codes The previous two results dealt with the situation where we encoded a 
classical random variable X in some quantum system, and would like to recover the original value 
X by an appropriate measurement on that quantum system. However, suppose X = X\. . . X n is 
a string of n bits, uniformly distributed and encoded by a map x \- > \4> x ), and it suffices for us if 

1 Via an encoding map x t-¥ \4> x )\ we generally use capital letters like X to denote random variables, lower case 
like x to denote specific values. 

2 For projective measurements the statement is somewhat trivial, since in a d-dimensional space one can have at 
most d non-zero orthogonal projectors. However, the same proof works for the most general states and measurements 
that quantum mechanics allows: so-called mixed states (probability distributions over pure states) and POVMs (which 
are measurements where the operators Pi,...,F% need not be projectors, but can be general positive semidefinite 
matrices summing to I); see Appendix [Al for these notions. 



Px= \\PM\\ 2 <Tr(P x ). 



The sum of these success probabilities is at most 





9 



we are able to decode individual bits Xi from this with some probability p > 1/2. More precisely, 
for each i £ [n] there should exist a measurement {Mi, I — Mi} allowing us to recover Xi~. for each 
x G {0, l} n we should have ||Mj|(/> x ) || 2 > p if x\ = 1 and || Mj | </>,,;} || 2 < 1 — p if Xi = 0. An encoding 
satisfying this is called a quantum random access code, since it allows us to choose which bit of X 
we would like to access. Note that the measurement to recover Xi can change the state \<ft x ), so 
generally we may not be able to decode more than one bit of x. 

An encoding that allows us to recover an ra-bit string requires about n qubits by Holevo. Random 
access codes only allow us to recover each of the n bits. Can they be much shorter? In small cases 
they can be: for instance, one can encode two classical bits into one qubit, in such a way that 
each of the two bits can be recovered with success probability 85% from that qubit |TT] . However, 
Nayak |96| proved that asymptotically quantum random access codes cannot be much shorter than 
classical (improving upon an m = f2(n/logn) lower bound from |17j). 

Theorem 2.2 (Nayak) Let x \— > \<j) x ) be a quantum random access encoding of n-bit strings into 
m-qubit states such that, for each i £ [n], we can decode Xi from \4>x) with success probability p 
(over a uniform choice of X and the measurement randomness). Then m > (1 — H(p))n, where 
H{p) = —plogp — (1 — p) log(l — p) is the binary entropy function. 

In fact the success probabilities need not be the same for all Xi ; if we can decode each Xi with 
success probability ^ > 1/2, then the lower bound on the number of qubits is m > X^=i( i— H(pi))- 
The intuition of the proof is quite simple: since the quantum state allows us to predict the bit Xi 
with probability pi, it reduces the "uncertainty" about Xi from 1 bit to H(pi) bits. Hence it 
contains at least 1 — H(pi) bits of information about Xi. Since all n Xi's are independent, the state 
has to contain at least X^iU ~~ H(Pi)) bits about X in total. For more technical details see [96] 
or Appendix B of [70]. The lower bound on m can be achieved up to an additive O(logn) term, 
even by classical probabilistic encodings. 

2.3 Quantum query algorithms 

Different models for quantum algorithms exist. Most relevant for our purposes are the quantum 
query algorithms, which may be viewed as the quantum version of classical decision trees. We will 
give a basic introduction here, referring to [M] for more details. The model and results of this 
section will not be needed until Section HI and the reader might want to defer reading this until 
they get there. 

The query model In this model, the goal is to compute some function / : A n — > B on a given 
input x G A n . The simplest and most common case is A = B = {0,1}. The distinguishing 
feature of the query model is the way x is accessed: x is not given explicitly, but is stored in a 
random access memory, and we are being charged unit cost for each query that we make to this 
memory. Informally, a query asks for and receives the i-th element Xi of the input. Formally, we 
model a query unitarily as the following 2-register quantum operation O x , where the first register 
is n-dimensional and the second is |j4|-dimensional: 

O x : \i,b) h-> \i,b + Xi) , 

where for simplicity we identify A with the additive group Zi^i, i.e., addition is modulo \A\. In 
particular, \i, 0) h-> \i, Xi). This only states what O x does on basis states, but by linearity determines 
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the full unitary. Note that a quantum algorithm can apply O x to a superposition of basis states; 
this gives us a kind of simultaneous access to multiple input variables 

A T-query quantum algorithm starts in a fixed state, say the all-0 state |0 . . . 0), and then inter- 
leaves fixed unitary transformations Uq, U±, . . . , Ut with queries. It is possible that the algorithm's 
fixed unitaries act on a workspace-register, in addition to the two registers on which O x acts. In 
this case we implicitly extend O x by tensoring it with the identity operation on this extra register. 
Hence the final state of the algorithm can be written as the following matrix-vector product: 



This state depends on the input x only via the T queries. The output of the algorithm is obtained 
by a measurement of the final state. For instance, if the output is Boolean, the algorithm could 
just measure the final state in the computational basis and output the first bit of the result. 

The query complexity of some function / is now defined to be the minimal number of queries 
needed for an algorithm that outputs the correct value f{x) for every x in the domain of / (with 
error probability at most some fixed value e). We just count queries to measure the complexity 
of the algorithm, while the intermediate fixed unitaries are treated as costless. In many cases, 
including all the ones in this paper, the overall computation time of quantum query algorithms 
(as measured by the total number of elementary gates, say) is not much bigger than the query 
complexity. This justifies analyzing the latter as a proxy for the former. 

Examples of quantum query algorithms Here we list a number of quantum query algorithms 
that we will need in later sections. All of these algorithms outperform the best classical algorithms 
for the given task. 

• Grover's algorithm [55] searches for a "solution" in a given ra-bit input x, i.e., an index i 
such that Xi = 1. The algorithm uses 0(y/n) queries, and if there is at least one solution in 
x then it finds one with probability at least 1/2. Classical algorithms for this task, including 
randomized ones, require Q(n) queries. 

• e-error search: If we want to reduce the error probability in Grover's search algorithm to 
some small s, then Q(^n log(l/e)) queries are necessary and sufficient [30J. Note that this 
is more efficient than the standard amplification that repeats Grover's algorithm 0(log(l/e)) 
times. 

• Exact search: If we know there are exactly t solutions in our space (i.e., \x\ = t), then a 
variant of Grover's algorithm finds a solution with probability 1 using 0(\f n/t) queries [29]. 

• Finding all solutions: If we know an upper bound t on the number of solutions (i.e., 
\x\ < t), then we can find all of them with probability 1 using ^* =1 0(yjn/i) = 0(y/tn) 



• Quantum counting: The algorithm Count(x, T) of [5S] approximates the total number of 
solutions. It takes as input an x G {0, l} n , makes T quantum queries to x, and outputs an 
estimate t £ [0,n] to t = \x\, the Hamming weight of x. For j > 1 we have the following 
concentration bound, implicit in [29j: Pr[|i — t\ > jn/T] = 0(1/ j). For example, using 
T = 0(y/n) quantum queries we can, with high probability, approximate t up to additive 
error of 0(y/n). 



U T O x U T -iO x ■ ■ ■ OxU&MO . . . 0) . 



queries 
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• Search on bounded-error inputs: Suppose the bits not given by a perfect 

oracle O x , but by an imperfect one: 

O x : |i, 6, 0) i-> y/l - Si \i, b © x i} Wi) + y/ei \i, b © x^w'j) , 

where we know s, we know that £j < e for each x and i, but we do not know the actual values 
of the Ej (which may depend on x), or of the "workspace" states \wi) and \w[). We call this an 
e -bounded- error quantum oracle. This situation arises, for instance, when each bit Xj is itself 
computed by some bounded-error quantum algorithm. Given the ability to apply O x as well 
as its inverse O" 1 , we can still find a solution with high probability using 0{y/n) queries [60] . 
If the unknown number of solutions is t, then we can still find one with high probability using 
0(\Jn/t) queries. 

Prom quantum query algorithms to polynomials An n-variate multilinear polynomial p is 
a function p : C n —> C that can be written as 

p(x 1 ,...,x n ) = ^2 a s Y[xi, 

SC[n] -ies 

for some complex numbers as- The degree of p is deg(p) = max-flS"! : as 7^ 0}. It is well known 
(and easy to show) that every function / : {0, l} n — > C has a unique representation as such a 
polynomial; deg(/) is defined as the degree of that polynomial. For example, the 2-bit AND 
function is p(xi,X2) = x\X2, and the 2-bit Parity function is p{x\,X2) = x\ + xi — 2x\X2- Both 
polynomials have degree 2. 

For the purposes of this survey, the crucial property of efficient quantum query algorithms is 
that the amplitudes of their final state are low-degree polynomials of x |49} 123] . More precisely: 

Lemma 2.3 Consider a T -query algorithm with input x £ {0, l} ra acting on an m-qubit space. 
Then its final state can be written as 

2S{0,l} m 

where each a z is a multilinear polynomial in x of degree at most T. 

Proof. The proof is by induction on T. The base case (T = 0) trivially holds: the algorithm's 
starting state is independent of x, so its amplitudes are polynomials of degree 0. 

For the induction step, note that a fixed linear transformation does not increase the degree of 
the amplitudes (the new amplitudes are linear combinations of the old amplitudes) , while a query 
to x corresponds to the following map: 

ai,o,w\i,0,w)+a i} i tW '\i, l,w') (->■ ((l-Xi)a ii0 , w +Xia iil}V] ')\i, 0, w) + (x i a ii o,w + 0--Xi)aii,i iW >)\i, l,w') , 

which increases the degree of the amplitudes by at most 1: if ocio w an d ocn w i are polynomials in 
x of degree at most d, then the new amplitudes are polynomials of degree at most d + 1. Since our 
inputs are 0/1- valued, we can drop higher degrees and assume without loss of generality that the 
resulting polynomials are multilinear. □ 
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If we measure the first qubit of the final state and output the resulting bit, then the probability 
of output 1 is given by 



which is a real- valued polynomial of x of degree at most 2T. This is true more generally: 

Corollary 2.4 Consider a T -query algorithm with input x £ {0, l} n . Then the probability of a 
specific output is a multilinear polynomial in x of degree at most 2T. 

This connection between quantum query algorithms and polynomials has mostly been used as 
a tool for lower bounds [23} El [TJ E3] : if one can show that every polynomial that approximates a 
function / : {0, l} n — > {0, 1} has degree at least d, then every quantum algorithm computing / 
with small error must use at least d/2 queries. We give one example in this spirit in Section [4.5[ 
in which a version of the polynomial method yielded a breakthrough in classical lower bounds. 
However, most of the applications in this survey (in Section U|) work in the other direction: they 
view quantum algorithms as a means for constructing polynomials with certain desirable properties. 

3 Using quantum information theory 

The results in this section all use quantum information-theoretic bounds to say something about 
non-quantum objects. 

3.1 Communication lower bound for inner product 

The first surprising application of quantum information theory to another area was in communi- 
cation complexity. The basic scenario in this area models 2-party distributed computation: Alice 
receives some n-bit input x, Bob receives some ra-bit input y, and together they want to compute 
some Boolean function /(x,y), the value of which Bob is required to output (with high proba- 
bility, in the case of bounded-error protocols). The resource to be minimized is the amount of 
communication between Alice and Bob, whence the name communication complexity. This model 
was introduced by Yao |130j . and a good overview of (non-quantum) results and applications may 
be found in the book of Kushilevitz and Nisan [76]. The area is interesting in its own right as a 
basic complexity measure for distributed computing, but has also found many applications as a 
tool for lower bounds in areas like data structures, Turing machine complexity, etc. The quantum 
generalization is quite straightforward: now Alice and Bob can communicate qubits, and possibly 
start with an entangled state. See [124] for more details and a survey of results. 

One of the most studied communication complexity problems is the inner product problem, 
where the function to be computed is the inner product of x and y modulo 2, i.e., IP(x,y) = 
Y17=i Xi yi m °d 2. Clearly, n bits of communication suffice for any function — Alice can just send x. 
However, IP is a good example where one can prove that nearly n bits of communication is also 
necessary. The usual proof for this result is based on the combinatorial notion of "discrepancy," 
but below we give an alternative quantum-based proof due to Cleve et al. [39] . 

Intuitively, it seems that unless Alice gives Bob a lot of information about x, he will not be able 
to guess the value of IP(x, y). However, in general it is hard to directly lower bound communication 




ze{o,i} 
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complexity by information, since we really require Bob to produce only one bit of outputjj The 
very elegant proof of [39J uses quantum effects to get around this problem: it converts a protocol 
(quantum or classical) that computes IP into a quantum protocol that communicates x from Alice 
to Bob. Holevo's theorem then easily lower bounds the amount of communication of the latter 
protocol by the length of x. This goes as follows. Suppose Alice and Bob have some protocol for 
IP, say it uses c bits of communication. Suppose for simplicity it has no error probability. By 
running the protocol, putting the answer bit x ■ y into a phase, and then reversing the protocol to 
set its workspace back to its initial value, we can implement the following unitary mapping 

\x)\y)^\x)(-iry\y). 

Note that this protocol now uses 2c bits of communication: c going from Alice to Bob and c going 
from Bob to Alice. The trick is that we can run this unitary on a superposition of inputs, at a 
cost of 2c qubits of communication. Suppose Alice starts with an arbitrary n-bit state \x) and Bob 
starts with the uniform superposition -^== ^^ e jo i}« \u)- If they apply the above unitary, the final 
state becomes 

V 2/6{0,l}" 

If Bob now applies a Hadamard transform to each of his n qubits, then he obtains the basis state 
\x), so Alice's n classical bits have been communicated to Bob. Theorem 12. II now implies that Alice 
must have sent at least n/2 qubits to Bob (even if Alice and Bob started with unlimited shared 
entanglement). Hence c > n/2. 

With some more technical complication, the same idea gives a linear lower bound on the com- 
munication of bounded-error protocols for IP. Nayak and Salzman [97] later obtained optimal 
bounds for quantum protocols computing IP. 

3.2 Lower bounds on locally decodable codes 

The development of error-correcting codes is one of the success stories of science in the second 
half of the 20th century. Such codes are eminently practical, and are widely used to protect 
information stored on discs, communication over channels, etc. From a theoretical perspective, 
there exist codes that are nearly optimal in a number of different respects simultaneously: they 
have constant rate, can protect against a constant noise-rate, and have linear-time encoding and 
decoding procedures. We refer to Trevisan's survey |119| for a complexity-oriented discussion of 
codes and their applications. 

One drawback of ordinary error-correcting codes is that we cannot efficiently decode small 
parts of the encoded information. If we want to learn, say, the first bit of the encoded message 
then we usually still need to decode the whole encoded string. This is relevant in situations where 
we have encoded a very large string (say, a library of books, or a large database), but are only 
interested in recovering small pieces of it at any given time. Dividing the data into small blocks 
and encoding each block separately will not work: small chunks will be efficiently decodable but 
not error-correcting, since a tiny fraction of well-placed noise could wipe out the encoding of one 
chunk completely. There exist, however, error-correcting codes that are locally decodable, in the 

3 Still, there are also classical techniques to turn this information-theoretic intuition into communication complexity 
lower bounds [36l l64l [2T1 [22l l63l 183] . 
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sense that we can efficiently recover individual bits of the encoded string. These are defined as 
follows [68] : 

Definition 3.1 C : {0, l} n — > {0, \} m is a (q, 5, e)-locally decodable code (LDC) if there is a 
classical randomized decoding algorithm A such that 

1. A makes at most q queries to an m-bit string y (non-adaptively) . 

2. For all x and i, and all y G {0, l} m with Hamming distance d(C(x),y) < 5m we have 

Pr[A v (i) = xi] > 1/2 + e. 

The notation A v (i) reflects that the decoder A has two different types of input. On the one 
hand there is the (possibly corrupted) codeword y, to which the decoder has oracle access and from 
which it can read at most q bits of its choice. On the other hand there is the index i of the bit that 
needs to be recovered, which is known fully to the decoder. 

The main question about LDCs is the tradeoff between the codelength m and the number of 
queries q (which is a proxy for the decoding-time). This tradeoff is still not very well understood. 
We list the best known constructions here. On one extreme, regular error-correcting codes are 
(m,S, l/2)-LDCs, so one can have LDCs of linear length if one allows a linear number of queries. 
Reed-Muller codes allow one to construct LDCs with m = poly(n) and q = polylog(n) [37J. For 
constant q, the best constructions are due to Efremenko [42J, improving upon Yekhanin [131 ] : for 

q = 2 r one can get codelength roughly 2 2(1 ° sn) , and for q = 3 one gets roughly 2 2%/1 ° s '\ For q = 2 
there is the Hadamard code: given x £ {0, l} n , define a codeword of length m = 2 n by writing 
down the bits x ■ z mod 2, for all z £ {0, l} n . One can decode Xi with 2 queries as follows: choose 
z £ {0, l} n uniformly at random and query the (possibly corrupted) codeword at indices z and 
z © ej, where the latter denotes the string obtained from z by flipping its i-th bit. Individually, 
each of these two indices is uniformly distributed. Hence for each of them, the probability that the 
returned bit of is corrupted is at most 5. By the union bound, with probability at least 1 — 26, both 
queries return the uncorrupted values. Adding these two bits modulo 2 gives the correct answer: 

C(x) z © C(x) z(3ei = (x ■ z) © (x ■ (Z © Ci)) = X ■ Ci = Xi . 

Thus the Hadamard code is a (2, 6, 1/2 — 2<5)-LDC of exponential length. Can we still do something 
if we can make only one query instead of two? It turns out that 1-query LDCs do not exist once n 
is sufficiently large [68] . 

The only super polynomial lower bound known on the length of LDCs is for the case of 2 queries: 
there one needs an exponential codelength and hence the Hadamard code is essentially optimal. 
This was first shown for linear 2-query LDCs by Goldreich et al. [52] via a combinatorial argument, 
and then for general LDCs by Kerenidis and de Wolf [70] via a quantum argument Q The easiest 
way to present this argument is to assume the following fact, which states a kind of "normal form" 
for the decoder. 

4 The best known lower bounds for general LDCs with q > 2 queries are only slightly superlinear. Those bounds, 
and also the best known lower bounds for 2-server Private Information Retrieval schemes, are based on similar 
quantum ideas [701 11221 1128j . The best known lower bound for 3-query LDCs is m = f2(n 2 /logn) [128 ; for linear 
3-query LDCs, a slightly better lower bound of m = fl(n 2 ) is known [129] . 
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Fact 3.2 (Katz and Trevisan [68J + folklore) For every (q,5,e)-LDC C : {0,1}™ -> {0, l} m , 
and for each i G [n], there exists a set Mi of Q(5sm/q 2 ) disjoint tuples, each of at most q indices 
from [m], and a bit a,^ for each tuple t G Mi, such that the following holds: 



Pr 



> 1/2 + fi(e/2«) , (4) 



where the probability is taken uniformly over x. Hence to decode Xi from C(x), the decoder can just 
query the indices in a randomly chosen tuple t from Mi, outputting the sum of those q bits and a^. 

Note that the above decoder for the Hadamard code is already of this form, with Mi = {(z, z(Bei)}. 
We omit the proof of Fact 13.21 It uses purely classical ideas and is not hard. 

Now suppose C : {0, l} n — > {0, l} m is a (2, 5, e)-LDC. We want to show that the codelength 
m must be exponentially large in n. Our strategy is to show that the following m-dimensional 
quantum encoding is in fact a quantum random access code for x, with some success probability 
p > 1/2: 



in 

x^|0,) = ^^(-l)^|j). 



m 

r- 



Theorem 12.21 then implies that the number of qubits of this state (which is [log m] ) is at least 
(1 — H(p))n = f2(n), and we are done. 

Suppose we want to recover Xi from \(p x )- We turn each Mj from Fact 13.21 into a measurement: 
for each pair (j,k) £ Mi form the projector P jk = + \k)(k\, and let P rest = £ 3 -g UteM . t 

be the projector on the remaining indices. These |Mj| + 1 projectors sum to the m-dimensional 
identity matrix, so they form a valid projective measurement. Applying this to \<p x ) gives outcome 
(j, k) with probability 2/m for each (j, k) E Mi, and outcome "rest" with probability p = 1 — Cl(5e). 
In the latter case we just output a fair coin flip as our guess for Xi. In the former case the state 
has collapsed to the following useful superposition: 

_L ((-1) C (*)J|j) + (-lf(*)*\k)^j = (-ij^ ^ + (_lf{x)j®C{x) k \ k ^ 

Doing a 2-outcome measurement in the basis (l/v2) (|j)±|A;)) now gives us the value C(x)j®C(x)k 
with probability 1. By (J3J), if we add the bit a^Q^) to this, we get Xi with probability at least 
1/2 + 0(e). The success probability of recovering Xj, averaged over all x, is 

Now l-H(l/2 + rj) = Q(n 2 ) for n G [0, 1/2], so after applying Theorem^ we obtain the following: 

Theorem 3.3 (Kerenidis and de Wolf) If C : {0,1}™ -> {0, l} m is a (2, 5, e) -locally decodable 
code, then m = 2 n ( <52£4 «). 

The dependence on 5 and e in the exponent can be improved to Se 2 |7Uj . This is still the only 
superpolynomial lower bound known for LDCs. An alternative proof was found later [26], using 
an extension of the Bonami-Beckner hypercontractive inequality. However, even that proof still 
follows the outline of the above quantum-inspired proof, albeit in linear-algebraic language. 
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3.3 Rigidity of Hadamard matrices 



In this section we describe an application of quantum information theory to matrix theory from |125j . 
Suppose we have some n x n matrix M, whose rank we want to reduce by changing a few entries. 
The rigidity of M measures the minimal number of entries we need to change in order to reduce 
its rank to a given value r. This notion can be studied over any field, but we will focus here on R 
and C. Formally: 

Definition 3.4 The rigidity of a matrix M is the following function: 

R M (r) = min{d(M,M) : rank(M) < r} , 

where d(M,M) counts the Hamming distance, i.e., the number of coordinates where M and M 
differ. The bounded rigidity of M is defined as 

R M (r, 6) = min{d(M, M) : rank(M) < r, max \M X)V - M X)V \ < 6} . 

Roughly speaking, high rigidity means that M's rank is robust: changes in few entries will not 
change the rank much. Rigidity was defined by Valiant [121] Section 6] in the 1970s with a view 
to proving circuit lower bounds. In particular, he showed that an explicit n x n matrix M with 
Rm{£u) > n 1+s for e, 5 > would imply that log-depth arithmetic circuits (with linear gates) that 
compute the linear map M : M. n — > R n need super linear circuit size. This motivates trying to prove 
lower bounds on rigidity for specific matrices. Clearly, Rm (r) > n — r for every full-rank matrix 
M, since reducing the rank by 1 requires changing at least one entry. This bound is optimal for 
the identity matrix, but usually far from tight. Valiant showed that most matrices have rigidity 
(n — r) 2 , but finding an explicit matrix with high rigidity has been open for decadesH Similarly, 
finding explicit matrices with strong lower bounds on bounded rigidity would have applications to 
areas like communication complexity and learning theory |88|. [90] . 

A very natural and widely studied class of candidates for such a high-rigidity matrix are the 
Hadamard matrices. A Hadamard matrix is an n x n matrix M with entries +1 and —1 that 
is orthogonal (so M T M = nl). Ignoring normalization, the fe-fold tensor product of the matrix 
from (|T]) is a Hadamard matrix with n = 2 k . (It is a longstanding conjecture that Hadamard 
matrices exist if, and only if, n equals 2 or a multiple of 4.) 

Suppose we have a matrix M differing from the Hadamard matrix M in R positions such that 
rank(M) < r. The goal in proving high rigidity is to lower-bound R in terms of n and r. Alon [12] 
proved R = J7(n 2 /r 2 ). This was later reproved by Lokam [88J using spectral methods. Kashin and 
Razborov [67] improved this to R > n 2 /256r. De Wolf [125] later rederived this bound using a 
quantum argument, with a better constant. We present this argument next. 

The quantum idea The idea is to view the rows of an n x n matrix as a quantum encoding of [n] . 
The rows of a Hadamard matrix M, after normalization by a factor 1/y/n, form an orthonormal set 
of n-dimensional quantum states \M{). If Alice sends Bob \Mi) and Bob measures the received state 
with the projectors Pj = \Mj)(Mj\, then he learns i with probability 1, since \ (Mi\Mj)\ 2 = Sij. Of 
course, nothing spectacular has been achieved by this — we just transmitted log n bits of information 
by sending logra qubits. 

5 Lokam [89] recently found an explicit n x n matrix with near-maximal rigidity; unfortunately his matrix has 
fairly large, irrational entries, and is not sufficiently explicit for Valiant's purposes. 
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Now suppose that instead of M we have some rank-r n x n matrix M that is "close" to M (we 
are deliberately being vague about "close" here, since two different instantiations of the same idea 
apply to the two versions of rigidity). Then we can still use the quantum states |Mj) that correspond 
to its normalized rows. Alice now sends the normalized i-th row of M to Bob. Crucially, she can do 
this by means of an r-dimensional quantum state, as follows. Let \vi), . . . , \ v r ) be an orthonormal 
basis for the row space of M. In order to transmit the normalized i-th row \Mi) = Y^j=i a j\ v i)-> 
Alice sends Y^j=i a j\J) an d Bob applies a unitary that maps \j) h-> \vj) to obtain |Mj). He measures 
this with the projectors {Pj}- Then his probability of getting the correct outcome i is 

p i = \(M i \M i )\ 2 . 

The "closer" M is to M, the higher these p^s are. But © in Section [2.21 tells us that the sum of 
the p^s lower-bounds the dimension r of the quantum system. Accordingly, the "closer" M is to 
M, the higher its rank has to be. This is exactly the tradeoff that rigidity tries to measure. 

This quantum approach allows us to quite easily derive Kashin and Razborov's [67] bound on 
rigidity, with a better constant. 

Theorem 3.5 (de Wolf, improving Kashin and Razborov) Let M be an n x n Hadamard 
matrix. If r < n/2, then Rm{t) > n 2 /4r. 

Note that if r > n/2 then Rm(t) < n, at least for symmetric Hadamard matrices such as H® k : 
then M's eigenvalues are all ±^/n, so we can reduce its rank to n/2 or less by adding or subtracting 
the diagonal matrix ^/nl. Hence a superlinear lower bound on i?Af( r ) cannot be proved for r > n/2. 

Proof. Consider a rank-r matrix M differing from M in R = Rm{t) entries. By averaging, there 
exists a set of a = 2r rows of M with a total number of at most aR/n errors (i.e., changes compared 
to M). Now consider the submatrix A of M consisting of those a rows and the b > n—aR/n columns 
that have no errors in those a rows. If b = then R > n 2 /2r and we are done, so we can assume 
A is nonempty. This A is error-free, hence a submatrix of M itself. We now use the quantum idea 
to prove the following claim (originally proved by Lokam using linear algebra, see the end of this 
section): 

Claim 3.6 (Lokam) Every a x b submatrix A of n x n Hadamard matrix M has rank r > ab/n. 

Proof. Obtain the rank-r matrix M from M by setting all entries outside of A to 0. Consider 
the a quantum states \M/) corresponding to the nonempty rows; they have normalization factor 
1/Vb. Alice tries to communicate a value i £ [a] to Bob by sending |Mj). For each such i, Bob's 
probability of successfully decoding i is = \(Mi\Mi}\ 2 = \b/Vbn~\ 2 = b/n. The states \Mi) are all 
contained in an r-dimensional space, so ([3j) implies Y2i=iPi — r - Combining both bounds concludes 
the proof. □ 



Hence we get 

, z^TVx _ / * \ ab a(n — aR/n) 

r = rank(M) > rank(,4) > — > 



n n 

Rearranging gives the theorem. □ 



Applying the quantum idea in a different way allows us to also analyze bounded rigidity: 



18 



Theorem 3.7 (Lokam, Kashin and Razborov, de Wolf) Let M be an n x n Hadamard ma- 
trix and 9 > 0. Then 

n 2 (n — r) 



R M (r,e) > 



29n + r(9 2 + 29) 



Proof. Consider a rank-r matrix M differing from M in R = Rm(j~,9) entries, with each entry 
My differing from My by at most 9. As before, define quantum states corresponding to its rows: 
\Mi) = Ci E]=iMij\j), where 



2 

Ml 



is a normalizing constant. Note that 

\Mij\ 2 < (n - d(Mi, MO) + d(M;, M t )(l + fl) 2 = n + d(M h M^if + 29) , 

i 

where <i(-, •) measures Hamming distance. Alice again sends |Mj) to Bob to communicate the value 
i 6 [a]. Bob's success probability pi is now 

Pi = | (M; | M) | 2 > i (n- 9d(Mi,Mi)) 2 > c 2 (n - 29d(M i ,M i )) > n ~ 26d ^ M *) 



n 



+ d(Mi,Mi)(9 2 + 29) 



n 



Observe that our lower bound on pi is a convex function of the Hamming distance d(Mi, Mi). Also, 
E[d(Mj,Mi)] = R/n over a uniform choice of i. Therefore by Jensen's inequality we obtain the 
lower bound for the average success probability p when i is uniform: 

n - 29Rln 

P > 



n + R(9 2 + 29)/n ' 

Now ([3]) implies p < r/n. Combining and rearranging gives the theorem. □ 

For 9 > n/r we obtain the second result of Kashin and Razborov |67| : 

R M (r,9) =n(n 2 (n-r)/r9 2 ). 
If 9 < n/r we get an earlier result of Lokam [88J : 

R M (r,9) = fl(n(n-r)/9). 

Did we need quantum tools for this? Apart from Claim [3761 the proof of Theorem 13.51 is fully 
classical, and that claim itself can quite easily be proved using linear algebra, as was done originally 
by Lokam |88} Corollary 2.2]. Let ai(A), . . . , a r (A) be the singular values of rank-r submatrix A. 
Since M is an orthogonal matrix we have M T M = nl, so all M's singular values equal y/n. The 
matrix A is a submatrix of M, so all cri(A) are at most y/n. Using the Frobenius norm, we obtain 
the claim: 

r 

ab = \\A\\ 2 F = ^ ai(A) 2 < rn . 
i=i 
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Furthermore, after reading a first version of [125] . Midrijanis [93] came up with an even simpler 
proof of the n 2 /4r bound on rigidity for the special case of 2 k x 2 k Hadamard matrices that are 
the A:-fold tensor product of the 2x2 Hadamard matrix. 

In view of these simple non-quantum proofs, one might argue that the quantum approach is 
an overkill here. However, the main point here was not to rederive more or less known bounds, 
but to show how quantum tools provide a quite different perspective on the problem: we can view 
a rank-r approximation of the Hadamard matrix as a way of encoding [n] in an r-dimensional 
quantum system; quantum information-theoretic bounds such as (|3|) can then be invoked to obtain 
a tradeoff between the rank r and the "quality" of the approximation. The same idea was used to 
prove Theorem 13. 71 whose proof cannot be so easily de-quantized. The hope is that this perspective 
may help in the future to settle some of the longstanding open problems about rigidity. 

4 Using the connection with polynomials 

The results of this section are based on the connection explained at the end of Section 12.31 efficient 
quantum query algorithms give rise to low-degree polynomials. 

As a warm-up, we mention a recent application of this. A formula is a binary tree whose 
internal nodes are AND and OR-gates, and each leaf is a Boolean input variable a;, or its negation. 
The root of the tree computes a Boolean function of the input bits in the obvious way. The size of 
the formula is its number of leaves. O'Donnell and Servedio [99J conjectured that all formulas of 
size n have sign-degree at most 0(y/n); the sign-degree is the minimal degree among all n-variate 
polynomials that are positive if, and only if, the formula is 1. Their conjecture implies, by known 
results, that the class of formulas is learnable in the PAC model in time 2 nl,/2+o(1) . 

Building on a quantum algorithm of Farhi et al. [45] that was inspired by physical notions from 
scattering theory, Ambainis et al. [16] showed that for every formula there is a quantum algorithm 
that computes it using n 1 / 2+ °^ queries. By Corollary 12.41 the acceptance probability of this algo- 
rithm is an approximating polynomial for the formula, of degree n 1//2+0 ^ 1 - ) . Hence that polynomial 
minus 1/2 is a sign-representing polynomial for the formula, proving the conjecture of O'Donnell 
and Servedio up to the o(l) in the exponent. Based on an improved 0(y / nlog(n)/loglog(n))- 
query quantum algorithm by Reichardt [108 and some additional analysis, Lee [80] subsequently 
improved this general upper bound on the sign-degree of formulas to the optimal 0(y/n), fully 
proving the conjecture (in contrast to [16], he really bounds sign-degree, not approximate degree). 

4.1 e-approximating polynomials for symmetric functions 

Our next example comes from [126] . and deals with the minimal degree of e-approximating poly- 
nomials for symmetric Boolean functions. A function / : {0, 1}™ — > {0, 1} is symmetric if its value 
only depends on the Hamming weight |x| of its input x E {0, l} n . Equivalently, f(x) = f(ir(x)) for 
all x S {0, l} n and all permutations tt £ S n . Examples are OR, AND, Parity, and Majority. 

For some specified approximation error e, let deg £ (/) denote the minimal degree among all 
n-variate multilinear polynomials p satisfying \p(x) — f(x)\ < e for all x G {0, l} n . If one is 
interested in constant error then one typically fixes e = 1/3, since approximations with different 
constant errors can easily be converted into each other. Paturi [100] tightly characterized the 
1/3-error approximate degree: if t 6 (0,n/2] is the smallest integer such that / is constant for 
\x\ G {t, . . . , n - t}, then deg 1/3 (/) = Q(Vtn). 
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Motivated by an application to the inclusion-exclusion principle of probability theory, Sher- 
stov [114J recently studied the dependence of the degree on the error e. He proved the surprisingly 
clean result that for all e 6 [2~ n , 1/3], 

deg e (/) = (deg 1/3 (/) + V«log(l/e)) , 

where the © notation hides some logarithmic factors (note that the statement is false if e <C 
2 _n , since clearly deg(/) < n for all /.) His upper bound on the degree is based on Chebyshev 
polynomials. De Wolf |126j tightens this upper bound on the degree: 

Theorem 4.1 (de Wolf, improving Sherstov) For every non-constant symmetric function f : 
{0, l} n -> {0, 1} and e € [2~ n , 1/3]: 

deg e (/) = O (deg 1/3 (/) + v^log(l/e)) . 

By the discussion at the end of Section 12.31 to prove Theorem 14.11 it suffices to give an e-error 
quantum algorithm for / that uses 0(deg 1 / 3 (/) + \J n log(l/e)) queries. The probability that the 
algorithm outputs 1 will be our e-error polynomial. For example, the special case where / is the 
n-bit OR function follows immediately from the 0{^n log (1/e) )-query search algorithm with error 
probability e that was mentioned there. 

Here is the algorithm for general symmetric /. It uses some of the algorithms listed in Section I2T31 
as subroutines. Let t = t(f) be as in Paturi's result. 

1. Use t— 1 applications of exact Grover to try to find up to t— 1 distinct solutions in x (remember 
that a "solution" to the search problem is an index i such that Xi = 1). Initially we run an 
exact Grover assuming \x\ = t — 1, we verify that the outcome is a solution at the expense 
of one more query, and then we "cross it out" to prevent finding the same solution again in 
subsequent searches. Then we run another exact Grover assuming there are t — 2 solutions, 
etc. Overall, this costs 

t-i 

E°(V^A) = 0(y/tR) = 0(deg 1/3 (/)) 

8=1 

queries. 

2. Use e/2-error Grover to try to find one more solution. This costs 0(y/n log(l/e)) queries. 

3. The same as step 1, but now looking for positions of 0s instead of Is. 

4. The same as step 2, but now looking for positions of 0s instead of Is. 

5. If step 2 did not find another 1, then we assume step 1 found all Is (i.e., a complete description 
of x), and we output the corresponding value of /. 

Else, if step 4 did not find another 0, then we assume step 3 found all 0s, and we output the 
corresponding value of /. 

Otherwise, we assume \x\ G {t, . . . , n — t} and output the corresponding value of /. 



21 



Clearly the query complexity of this algorithm is 0(deg 1 / 3 (/) + yfn log(l/e)), so it remains to 
upper bound its error probability. If |x| < t then step 1 finds all Is with certainty and step 2 will 
not find another 1 (since there aren't any left after step 1), so in this case the error probability is 0. 
If |x| > n — t then step 2 finds a 1 with probability at least 1 — e/2, step 3 finds all 0s with certainty, 
and step 4 does not find another (again, because there are none left); hence in this case the error 
probability is at most e/2. Finally, if \x\ G {t, . . . , n — t} then with probability at least 1 — e/2 
step 2 will find another 1, and with probability at least 1 — e/2 step 4 will find another 0. Thus with 
probability at least 1 — e we correctly conclude \x\ G {t, . . . , n — t} and output the correct value of /. 
Note that the only property of / used here is that / is constant on |x| G {t, . . . , n— t}; the algorithm 
still works for Boolean functions / that are arbitrary (non-symmetric) when |x| g" {t, . . . ,n — t}, 
with the same query complexity 0(\/tn + \Jn log(l/e)). 

4.2 Robust polynomials 

In the previous section we saw how quantum query algorithms allow us to construct polynomials 
(of essentially minimal degree) that e- approximate symmetric Boolean functions. In this section we 
show how to construct robust polynomial approximations. These are insensitive to small changes 
in their n input variables. Let us first define more precisely what we mean: 

Definition 4.2 Let p : W 1 — > M be an n-variate polynomial (not necessarily multilinear). Then p 
e-robustly approximates f : {0, l} n — > {0, 1} if for every x G {0, l} n and every z G [0, l] n satisfying 
\%i — Xi\ < e for all i G [n], we have p{z) G [0, 1] and \p(z) — f(x)\ < e. 

Note that we do not restrict p to be multilinear, since the inputs we care about are no longer 
0/1-valued. The degree of p is its total degree. Note that we require both the z^s and the value p(z) 
to be in the interval [0, 1]. This is just a matter of convenience, because it allows us to interpret 
these numbers as probabilities; using the interval [—£, 1+s] instead of [0, 1] would give an essentially 
equivalent definition. 

One advantage of the class of robust polynomials over the usual approximating polynomials, is 
that it is closed under composition: plugging robust polynomials into a robust polynomial gives 
another robust polynomial. For example, suppose a function / : {0, l}™ 1 " 2 — y {0, 1} is obtained by 
composing f\ : {0, l} ni — > {0, 1} with n\ independent copies of f2 ■ {0, l} n2 — > {0, 1} (for instance 
an AND-OR tree). Then we can just compose an e-robust polynomial for f\ of degree d\ with an 
e-robust polynomial for fi of degree c?2 ; to obtain an e-robust polynomial for / of degree d\d%. The 
errors "take care of themselves," in contrast to ordinary approximating polynomials, which may 
not compose in this fashionH 

How hard is it to construct robust polynomials? In particular, does their degree have to be 
much larger than the usual approximate degree? A good example is the n-bit Parity function. If 
the n inputs 0/1-valued then the following polynomial represents Parity Q 

1 1 n 

i=i 

6 Reichardt 109 showed recently that such a clean composition result also holds for the usual bounded-error 
quantum query complexity, by going back and forth between quantum algorithms and span programs (which compose 
cleanly) . 

7 If inputs and outputs were ±l-valued, the polynomial would just be the product of the n variables. 



22 



This polynomial has degree n, and it is known that any ^-approximating polynomial for Parity 
needs degree n as well. However, it is clear that this polynomial is not robust: if each X{ = is 
replaced by z\ = e, then the resulting value p(z) is exponentially close to 1/2 rather than e-close 
to the correct value 0. One way to make it robust is to individually "amplify" each input variable 
Zi, such that if Zj E [0, e] then its amplified version is in, say, [0, l/100n] and if Zi £ [1 — e, 1] then 
its amplified version is in [1 — l/100n, 1]. The following univariate polynomial of degree k does the 
trick: 

a (y)= E ( k X 3 ^-y) k - 3 - 

j>k/2 VJ/ 

Note that this polynomial describes the probability that k coin flips, each with probability y of 
being 1, have majority 1. By standard Chernoff bounds, if y E [0, e] then a(y) E [0, exp(— f2(fc))] 
and if y E [1 — e, 1] then a(y) E [1 — exp(— Cl(k)), 1]. Taking k = O(logra) and substituting a(z{) for 
Xi in ([5]) gives an e-robust polynomial for Parity of degree 0(nlogn). Is this optimal? Since Parity 
crucially depends on each of its n variables, and amplifying each Zi to polynomially small error 
needs degree 0(logn), one might conjecture robust polynomials for Parity need degree O(nlogn). 
Surprisingly, this is not the case: there exist e-robust polynomials for Parity of degree 0(n). Even 
more surprisingly, the only way we know how to construct such robust polynomials is via the 
connection with quantum algorithms. Based on the quantum search algorithm for bounded-error 
inputs mentioned in Section 12.31 Buhrman et al. [32] showed the following: 

Theorem 4.3 (BNRW) There exists a quantum algorithm that makes O(n) queries to an e- 
bounded- error quantum oracle and outputs x\,...,x n with probability at least 1 — e. 

The constant in the O(-) depends on e, but we will not write this dependence explicitly. 

Proof. [Proof (sketch)] The idea is to maintain an ra-bit string x, initially all-0, and to look 
for differences between x and x. Initially this number of differences is \x\. If there are t difference 
points (i.e., i E [n] where Xi ^ Xi), then the quantum search algorithm A with bounded-error inputs 
finds a difference point i with high probability using 0(y/ n/t) queries. We flip the i-th. bit of x. 
If the search indeed yielded a difference point, then this reduces the distance between x and x by 
one. Once there are no differences left, we have x = x, which we can verify by one more run of A. 
If A only finds difference points, then we would find all differences in total number of queries 

1*1 

0{VW~t) = 0(y/\x\n) . 

t=i 

The technical difficulty is that A errs (i.e., produces an output i where actually x% = Xi) with 
constant probability, and hence we sometimes increase rather than decrease the distance between 
x and x. The proof details in [32] show that the procedure is still expected to make progress, and 
with high probability finds all differences after 0(n) queries|f| □ 

This algorithm implies that we can compute, with O(n) queries and error probability e, any 
Boolean function / : {0, l} n — > {0, 1} on e-bounded-error inputs: just compute x and output f(x). 

8 The same idea would work with classical algorithms, but gives query complexity roughly X^'ii n A ~ nm \ x \- 
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This is not true for classical algorithms running on bounded-error inputs. In particular, classical 
algorithms that compute Parity with such a noisy oracle need 0(nlogn) queries [46J. 

The above algorithm for / is "robust" in a very similar way as robust polynomials: its output 
is hardly affected by small errors on its input bits. We now want to derive a robust polynomial 
from this robust algorithm. However, Corollary 12.41 only deals with algorithms acting on the usual 
non-noisy type of oracles. We circumvent this problem as follows. Pick a sufficiently large integer 
m, and fix error- fractions G [0,e] that are multiples of 1/m. Convert an input x G {0, l} n into 
X G {0, l} nm = X\ . . . X n , where each Xi is m copies of X{ but with an effraction of errors (the 
errors can be placed arbitrarily among the m copies of x{). Note that the following map is an 
e-bounded-error oracle for x that can be implemented by one query to X: 



J-Y^\b®x i3 )\j) = vT^7 
Jm ^-^ 

V 7=1 



i,b,0) i-> \i)—j= / \b®Xij)\j) = Vl - \i,b®Xi,Wi) + \i, b x t , w'j) . 



Now consider the algorithm that Theorem 14.31 provides for this oracle. This algorithm makes O(n) 
queries to X, it is independent of the specific values of e, or the way the errors are distributed over 
Xi, and it has success probability > 1 — e as long as e, < e for each i G [n]. Applying Corollary 12.41 
to this algorithm gives an nm- variate multilinear polynomial p in X of degree d = 0(n). This p(X) 
lies in [0,1] for every input X G {0, l} nm (since it is a success probability), and has the property 
that p(Xi, . . . , Xn) is e-close to f(x±, . . . , x n ) whenever |Xj|/m is e-close to Xi for each i. 

It remains to turn each block X{ of m Boolean variables into one real- valued variable £j. This 
can be done by the method of symmetrization [95] as follows. Define a new polynomial p\ which 
averages p over all permutations of the m bits in X\: 

p 1 (X l ,...,X n ) = — V p(7r(X l ),X 2 ,...,X m ). 
ml 

Symmetrization replaces terms like X\\ ■ ■ ■ X\t by 

Therefore p\ will be a linear combination of terms of the form Vt{X\)r(Xi, . . . , X n ) for t < d— deg(r). 
On X\ G {0, l} m of Hamming weight |Xi|, the sum Vt{X\) evaluates to 

\X X \\ \X X \{\X X \ -!)••• (|^| -t+1) 



t J t\ 

which is a polynomial in |Xi | = Xij °f degree t. Hence we can define z\ = \X x \/m, and replace 

Pi by a polynomial q x of total degree at most d in Z!,X 2 , . . . , X m , such that . . . ,X n ) = 

qi(\Xi\/m, X2 ■ ■ ■ , X n ). We thus succeeded in replacing the block X\ by one real variable z\. 
Repeating this for X2, ■ ■ ■ , X n , we end up with a polynomial q(z\, . . . , z n ) such that p(Xi, . . . , X n ) = 
q(\X\\/m, . . . , \X n \/m) for all Xi, . . . , X n G {0, l} nm . This q will not be multilinear anymore, but 
it has degree at most d = 0{n) and it e-robustly approximates /: for every x G {0, 1}™ and for 
every z G [0, l] n satisfying \zi — Xi\ < e for all i G [n], we have that q(z) and f{x) are e-close. 
(Strictly speaking we have only dealt with the case where the Zi are multiples of 1/m, but we can 
choose m as large as we want and a low-degree polynomial cannot change much if its input varies 
between i/m and (i + l)/m.) 
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Corollary 4.4 (BNRW) For every Boolean function f , there exists an n-variate polynomial of 
degree 0(n) that e-robustly approximates f . 

4.3 Closure properties of PP 

The important classical complexity class PP consists of all languages L for which there exists a 
probabilistic polynomial-time algorithm that accepts an input x with probability at least 1/2 if 
x € L, and with probability less than 1/2 if x ^ L. Note that under this criterion, the algorithm's 
acceptance probabilities may be extremely close to 1/2, so PP is not a realistic definition of the 
class of languages feasibly computable with classical randomness. Indeed, it is not hard to see that 
PP contains NP. Still, PP is worthy of study because of its many relations to other complexity 
classes. 

One of the most basic questions about a complexity class C is which closure properties it 
possesses. For example, if Li,L2 G C, is L\ n L2 6 C? That is, is C closed under intersection? In 
the case of PP, this question was posed by Gill [65], who defined the class, and was open for many 
years before being answered affirmatively by Beigel et al. [25]. It is now known that PP is closed 
under significantly more general operations |25 tl48l l2]. Aaronson [2] gave a new and arguably more 
intuitive proof of the known closure properties of PP, by providing a quantum characterization of 
PP. 

To describe this result, we first briefly introduce the model of quantum polynomial-time com- 
putation. A quantum circuit is a sequence of unitary operations U±, . . . , Ut, applied to the initial 
state |a;)|0 m ), where x 6 {0, 1}™ is the input to the circuit and |0 m ) is an auxiliary workspace. By 
analogy with classical circuits, we require that each Ut be a local operation which acts on a constant 
number of qubits. For concreteness, we require each Ut to be a Hadamard gate, or the single-qubit 
operation 



or the two-qubit controlled-NOT gate (which maps computational basis states \a, b) 1— > \a, a &}). 
A computation ends by measuring the first workspace qubit. We say that such a circuit computes 
a function f n : {0, 1}™ — > {0, 1} with bounded error if on each x £ {0, l} n , the final measurement 
equals f n {x) with probability at least 2/3. BQP is the class of languages computable with bounded 
error by a logspace-uniform family of polynomial-size quantum circuits. Here, both the workspace 
size and the number of unitaries are required to be polynomial. The collection of gates we have 
chosen is universal, in the sense that it can efficiently simulate any other collection of local unitaries 
to within any desired precision [981 Section 4.5.3]. Thus our definition of BQP is a robust one. 

In [2], Aaronson investigated the power of a "fantasy" extension of quantum computing in 
which an algorithm may specify a desired outcome of a measurement in the standard basis, and 
then condition the quantum state upon seeing that outcome (we require that this event have nonzero 
probability). Formally, if a quantum algorithm is in the pure state = | ^0) 1 0) + |^i)|l) (where 
we have distinguished a 1-qubit register of interest, and is non-zero), then the postselection 
transformation carries to 

The complexity class PostBQP is defined as the class of languages computable with bounded 
error by a logspace-uniform family of polynomial-size quantum circuits that are allowed to contain 
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postselection gates. We have: 

Theorem 4.5 (Aaronson) PP = PostBQP. 

From Theorem l4.5l the known closure properties of PP follow easily. For example, it is clear that 
if L\,L2 G PostBQP, then we may amplify the success probabilities in the PostBQP algorithms for 
these languages, then simulate them and take their AND to get a PostBQP algorithm for L\ n Li- 
This shows that PostBQP (and hence also PP) is closed under intersection. 

Proof. [Proof (sketch)] We begin with a useful claim about postselection: any quantum algorithm 
with postselection can be modified to make just a single postselection step after all its unitary 
transformations (but before its final measurement). We say that such a postselection algorithm is 
in canonical form. To achieve this, given any PostBQP algorithm A for a language L, consider a new 
algorithm A' which on input x, simulates A(x). Each time A makes a postselecting measurement 
on a qubit, A' instead records that qubit's value into a fresh auxiliary qubit. At the end of the 
simulation, A' postselects on the event that all these recorded values are 1, by computing their AND 
in a final auxiliary qubit \z) and postselecting on \z) = |1). The final state of A'(x) is equivalent to 
the final state of A(x), so A' is a PostBQP algorithm for L and is in canonical form. This conversion 
makes it easy to show that PostBQP C PP, by the same techniques which show BQP C PP [5]. 
We omit the details, and turn to show PP C PostBQP. 

Let L G PP, and let M be a probabilistic polynomial-time algorithm witnessing this fact. Say 
M uses m = poly(n) random bits on an input x of length n. Then any such input defines a function 
g = g x :{0,l} m ^{0,1}, by the rule 

g(r) = [M(x) accepts when using r as its random string] . 

By definition, x G L 44> |(7 _1 (1)| > 2 m ~ 1 . We show how to determine whether |g _1 (l)| > 2 m_1 , 
in quantum polynomial time with postselection. Let s = |<7 _1 (1)|; we may assume without loss of 
generality that s > 0. The core of the algorithm is the following subroutine, which uses postselection 
to produce a useful quantum state: 

Al. Initialize an (m + l)-bit register to |0 m+1 ). Apply H® m to the first m qubits, then apply g 
to these qubits and add the result into the (m + l)-st qubit, yielding the state 

V «E{0,l} m 



A2. Apply a Hadamard gate to each qubit of the x-register, yielding 



E 



xe{o,iy 



^£{0,1}' 



where w ■ x = WiXi denotes inner product of bit strings. Note that m • x = for all x, so 
the component of the above state with first register equal to m is 

_L ( (2-_ S )|0) + S |1)). 
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A3. Postselect on the first m qubits measuring to m . This yields the reduced state 

= (2™-s)|0) + 8 |l) 
^(2 m -s) 2 + s 2 

on the last qubit. 

Using this subroutine, we can prepare fresh copies of \ip) on demand. Now let (a,/3) be a pair 
of positive reals to be specified later, satisfying a 2 + /3 2 = 1. Using this pair, we define a second 
subroutine: 

Bl. Prepare a qubit \z) = a\0) + /3\1). Perform a controlled-Hadamard operation on a fresh copy 
of |^) with control qubit \z), yielding the joint state ck|0) + (3\l)H\ip). Note that 

i (2 m |0) + (2 m -2s)|l)) 
W = - V2 



^(2™ - s) 2 + s 



2 



B2. Now postselect on the second qubit measuring to 1, yielding the reduced state 

_ q S |0)+/3j-(2™-2s)|l) 
' Z ' ~ v / a 2 s 2 + (/3 2 /2)(2 m -2s) 2 ' 

Perform a projective measurement relative to the basis 
recording the result. 

Finally, our algorithm to determine if s > 2™ 1 " 1 is as follows: we perform steps B1-B2 for each choice 
of (a, f$) from a collection {(cti, /3j)}^L__ m of pairs chosen to satisfy oti/Pi = 2 l , i 6 {— m, . . . ,m}. 
For each choice of i, we repeat steps B1-B2 for O(logm) trials, and use the results to estimate the 
probability |(+|-z')| 2 of outcome |+) under 

The idea of the analysis is that, if s < 2 m , then \z') is in the strictly-positive quadrant of 
the real plane with axes |0), |1); here we are using that a,/3 are positive. Then a suitable choice of 
(oti,(3i) will cause \z') to be closely aligned with |+), and will yield measurement outcome |+) with 
probability close to 1. On the other hand, if s > 2 m_1 , then \z') is never in the same quadrant as 
|+) or the equivalent state — Then, for any choice of (a, /?) we have |(+|^')| < l/\/2 and the 
probability of measuring |+) is at most 1/2. Thus our repeated trials allow us to determine with 
high probability whether or not s > 2 m ~ 1 . We conclude that PP C PostBQP. □ 



The original proof that PP is closed under intersection [25] crucially relied on ideas from ap- 
proximation by rational functions, i.e., by ratios of polynomials. Aaronson's proof can also be seen 
as giving a new method for building rational approximations. To see this, note that the postse- 
lection operation also makes sense in the quantum query model, and that in this model as well 
algorithms may be put in canonical form (with a single postselection). Suppose a canonical- form 
query algorithm with postselection makes T quantum queries. If its state before the postselection 
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step is \if T ) = IV'iOlO) + l^f )|1)> then the amplitudes of \ipi) are degree- T multilinear polynomials 
in the input x, by Corollary 12.41 Then inspecting the postselected state 

we see that the squared amplitudes are rational functions of x of degree 2T in x, that is, each 
squared amplitude can be expressed as p(x)/q(x), where both numerator and denominator are 
polynomials of degree at most 2T. Moreover, all squared amplitudes have the same denominator 
q(x) = (^flipf). Thus in the final decision measurement, the acceptance probability is a degree-2T 
rational function of x. 

This may be a useful framework for designing other rational approximations. In fact, one can 
show the quantum approach to rational approximation always gives an essentially optimal degree: 
if / is 1/3-approximated by a rational function of degree d, then / is computed by a quantum query 
algorithm with postselection, making 0(d) queries [127] . and vice versa. Such a tight, constant- 
factor relation between query complexity and degree is known to be false for the case of the usual 
(non-postselected) quantum query algorithms and the usual approximate degree [T5] , 

4.4 Jackson's theorem 

In this section we describe how a classic result in approximation theory, Jackson's Theorem, can 
be proved using quantum ideas. 

The approximation of complicated functions by simpler ones, such as polynomials, has interested 
mathematicians for centuries. A fundamental result of this type is Weierstrass's Theorem from 
1885 |123j . which states that any continuous function g : [0, 1] — > R can be arbitrarily well- 
approximated by polynomials. For any function / : [0,1] — > R, let WfW^ = ^P x e\o,i] then 
Weierstrass's Theorem states that for any continuous g and any e > 0, there exists a polynomial 
p(x) such that \\p — gW^ < e. 

In 1912 Bernstein gave a simple construction of such polynomials, which can be described in a 
probabilistic fashion |28^ I13j. Fix a continuous function g on [0, 1] and a value n > 1, and consider 
the following algorithm: flip n coins, each taking value 1 with probability x. Let t € {0, . . . , n} 
be the number of Is observed in the resulting string X £ {0, l} n . Output g(t/n). Note that the 
expected value of the random variable t/n is exactly x, and its standard deviation is 0(l/y/n). 
Consider the expected value of the output of this procedure, given a bias x: 

n 

B g>n (x) = E [g (|X|/n)] = £ Pr [\X\ = t] ■ g (t/n) . 

t=o 

We have 

Pr[|X|=t] = Qx'(l-x) n ~', 

hence Bg jTl (x) is a polynomial in x of degree at most n. Moreover, B g ^ n (x) is intuitively a good 
estimator for g(x) since t/n is usually close to x. To quantify this, let u)$(g), the modulus of 
continuity of g at scale 5, be the supremum of \g(x) — g(y)\ over all x, y £ [0, 1] such that \x — y\ < 6. 
Then it can be shown |lllj that \\Bg >ri — gW^ = 0(u 1 /^i(g)). This is not too surprising since we 
expect the fraction of heads observed to be concentrated in an interval of length 0(1/ y/n) around 
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x. The interval [0, 1] is compact, so g is uniformly continuous and 0Js{g) — > as 5 — > 0. Thus the 
Bernstein polynomials yield a quantitative refinement of Weierstrass's Theorem. 

Around the same time as Bernstein's work, an improved result was shown by Jackson |62j : 

Theorem 4.6 (Jackson) If g is continuous, then for all n > 1 there exists a polynomial J g ^ n of 
degree n such that ||J gi „ - gW^ = 0(u x / n (g)). 

In Jackson's theorem the quality of approximation is based on the maximum fluctuation of g at 
a much smaller scale than Bernstein's (1/n instead of l/^/n). Up to the constant factor, Jackson's 
Theorem is optimal for approximation guarantees based on the modulus of continuity. The original 
proof used trigonometric ideas. Drucker and de Wolf [41] gave a proof of Jackson's Theorem that 
closely follows Bernstein's idea, but replaces his classical estimation procedure with the quantum 
counting algorithm of [29] . As mentioned in Section 12. 3} with M queries this algorithm produces 
an estimate t of the Hamming weight t = \X\ of a string X G {0, 1} , such that for every integer 
j > 1 we have Pr[|i — t\ > jN/M) = 0(1/ j). For our purposes we need an even sharper estimate. 
To achieve this, let Count' {X, M) execute Count(X, M) five times, yielding estimates ti,...,ts, 
and output their median, denoted t me dlfl Note that for \t me d — t\ > jN/M to hold, we must have 
\ti — t\ > jN/M for at least three of the trials i G {1, . . . , 5}. This happens with probability at 
most 0(l/j 3 ), which implies 

E [\tme d - *l] < E ^ ■ = O (N/M) . (6) 

With this estimator in hand, let us fix a continuous function g and n > 1, and consider the following 
algorithm A(x) to estimate g(x) for an unknown value x: flip N = n 2 independent coins, each taking 
value 1 with probability x, yielding an A-bit string X. Set M = [n/10\ and run Count' {X, M), 
yielding an estimate t me d, and output g(i me d/N). This algorithm makes 5M < n/2 queries to X, so 
it follows from Corollary 12 . 41 that its expected output value (viewed as a function of the N bits of X) 
is a multilinear polynomial of degree at most n. Define J g ^ n (x) = K[A(x)], where the expectation is 
taken both over the choice of X and over the randomness in the output of the quantum counting 
procedure. Note that E[Aj x . . . Xi d ] = x d , since the bits of X are independent coin flips, each with 
expectation x. Hence J g ^ n is a degree-n univariate polynomial in x. We bound | J 9>n (x) — g(x) \ as 
follows. Consider the random variable x, = t me( [/N. Using the definition of uii/ n (g), we have 

\Jg,n( x ) ~ 9{x)\ = 
< 

< 



< 

Since t = \X\ has expectation xN and variance x(l — x)N, we have E[|£ — xN\] = 0(\HV) = 0(n). 
By ([6]) we have E[|t me d — t\] = 0(N/M) = 0{n). Plugging these two findings into our expression 
yields | J Sjn (x) — g(x)\ = 0(u>i/ n (g)), for all x G [0, 1]. This proves Jackson's Theorem. 

9 A more careful analysis in [41] allows a median of three trials to be used. 



\E[g(x)] - g(x)\ 
K[\g(x) - g(x)\] 
E[(n • \x - x\ + 1) • U3 X / n {g)} 

■ ^&[\i med - xN\] + 1^ • ui 1/n (g) 
(^{E[\t med - t\] + E[\t - xN\]) + lj ■ uj 1/n (g) . 
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At the heart of quantum counting, one finds trigonometric ideas closely related to those used 
in some classical proofs of Jackson's Theorem (see [H] for a discussion). Thus it is not really fair 
to call the above a simplified proof. It does, however, show how Bernstein's elegant probabilistic 
approach to polynomial approximation can be carried further with the use of quantum algorithms. 

4.5 Separating strong and weak communication versions of PP 

The previous examples all used the connection between polynomials and quantum algorithms, more 
precisely the fact that an efficient quantum (query) algorithm induces a low-degree polynomial. The 
last example of this section uses a connection between polynomials and quantum communication 
protocols. Communication complexity was introduced in Section 13.11 Like computational com- 
plexity, communication complexity has a host of different models: one can study deterministic, 
bounded-error, or non-deterministic protocols, and so on. Recall the complexity class PP from 
Section 14.31 This class has two plausible analogues in the communication setting: an unbounded- 
error version and a weakly-unbounded-error version. In the first, we care about the minimal c such 
that there is a c-bit randomized communication protocol computing / : {0, l} n x {0, l} n — > {0, 1} 
correctly with success probability strictly greater than 1/2 on every input (though the success 
probability may be arbitrarily close to 1/2). In the second version, we care about the minimal c 
such that there is a c-bit randomized protocol computing / correctly with success probability at 
least 1/2 + (3 on every input, for f3 > 2~ c . The motivation for the second version is that c plays 
the role that computation-time takes for the computational class PP, and in that class we auto- 
matically have f3 > 2~ c : a time-c Turing machine cannot flip more than c fair coins, so all events 
have probabilities that are multiples of 1/2 C . Let UPP(/) and PP(/) correspond to the minimal 
communication c in the unbounded and weakly-unbounded cases, respectively. Both measures were 
introduced by Babai et al. [19], who asked whether they were approximately equal. (One direction 
is clear: UPP(/) < PP(/)). 

These two complexity measures are interesting as models in their own right, but are all the more 
interesting because they each correspond closely to fundamental complexity measures of Boolean 
matrices. First, UPP(/) equals (up to a small additive constant) the log of the sign-rank of / |101j . 
This is the minimal rank among all 2™ x 2™ matrices M satisfying that the sign of the entry M Xy y 
equals (— \)f^ x ^ for all inputs. Second, PP(/) is essentially the discrepancy bound [72], which in 
turn is essentially the margin complexity of / [86, 85j. The latter is an important and well-studied 
concept from computational learning theory. Accordingly, an example where UPP(/) <C PP(/) 
also separates sign-rank on the one hand from discrepancy and margin complexity on the other. 

Simultaneously but independently, Buhrman et al. [33J and Sherstov [115J (later improved 
in [116] ) found functions where UPP(/) is exponentially smaller than PP(/), answering the question 
from [19]. We will give the proof of [33] here. Our quantum tool is the following lemma, first proved 
(implicitly) in [104] and made more explicit in [73\ Section 5]. 

Lemma 4.7 (Razborov) Consider a quantum communication protocol on m-bit inputs x and y 
that communicates q qubits, with outputs and 1 ("reject" and "accept"), and acceptance probabil- 
ities denoted by P(x,y). For i S {0, . . . ,m/4}, define 

P(i) = E| a .| = |j,| =m / 4) | a;A j / | =i [P(x,j/)] , 

where the expectation is taken uniformly over all x,y £ {0, l} m that each have Hamming weight m/4 
and that have intersection size i. For every d < m/4 there exists a univariate degree-d polynomial 
p (over the reals) such that \P(i) — p(i)\ < 2~ d / A+2q for all i E {0, . . . , m/8}. 
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If we choose degree d = 8q + 41og(l/e), then p approximates P to within an additive e for all 
i G {0, . . . , m/8}. This allows us to translate quantum protocols to polynomials. The connection is 
not as clean as in the case of quantum query algorithms (Corollary 12.40 : the relation between the 
degree and the quantum communication complexity is less tight, and the resulting polynomial only 
approximates certain average acceptance probabilities instead of exactly representing the acceptance 
probability on each input. 

Lemma [4. 71 allows us to prove lower bounds on quantum communication complexity by proving 
lower bounds on polynomial degree. The following result [43} 1112] is a very useful degree bound: 

Theorem 4.8 (Ehlich and Zeller, Rivlin and Cheney) Let p : R — )■ R be a polynomial such 
that b\ < p(i) < &2 for every integer < i < n, and its derivative has \p'(x)\ > c for some real 
< x < n. Then deg(p) > y 1 cnjic + 62 — 61). 

The function and its unbounded-error protocol To obtain the promised separation, we 
use a distributed version of the ODD-MAX-BIT function of Beigel [24J. Let x,y G {0,l} n , and 
k = max{i G [n] : Xj = yi = 1} be the rightmost position where x and y both have a 1 (set k = if 
there is no such position). Define f(x,y) to be the least significant bit of k, i.e., whether this k is 
odd or even. Buhrman et al. |33| proved: 

Theorem 4.9 (BVW) For the distributed ODD-MAX-BIT function we have UPP(/) = O(logn) 
and PP(/) = ^(n 1 / 3 ). 

The upper bound is easy: For i G [n], define probabilities pi = c2 l , where c = l/^ILi^ 1 
is a normalizing constant. Consider the following protocol. Alice picks a number i G [n] with 
probability pi and sends over (i,Xi) using [logn] + 1 bits. If Xj = y^ = 1 then Bob outputs the 
least significant bit of i, otherwise he outputs a fair coin flip. This computes / with positive (but 
exponentially small) bias. Hence UPP(/) < [logn] + 1. 

Weakly-unbounded-error lower bound It will be convenient to lower-bound the complex- 
ity of quantum protocols computing / with weakly-unbounded error, since this allows us to use 
Lemma Consider a quantum protocol with q qubits of communication that computes / with 

bias (3 > 0. Let f3(x,y) = P{x,y) - 1/2. Then p(x,y) > if f(x,y) = 1, and P(x,y) < -f3 if 
f(x,y) = 0. Our goal is to lower-bound q + log(l//3). We will do that below by a process that 
iteratively fixes some of the input bits, applying Lemma 14.71 to each intermediate function to find 
a good fixing, in a way that produces a bounded function with larger and larger "swings." 

Define d = \8q + 41og(2//3)] and m = 32d 2 + 1. Assume for simplicity that 2m divides n. We 
partition [n] into n/2m consecutive intervals (or "blocks"), each of length 2m. In the first interval 
(from the left), fix the bits Xi and yi to for even i; in the second, fix x\ and to for odd i; in 
the third, fix X{ and yi to for even i, etc. In the j-th interval there are m unfixed positions left 
for each party. Let x^' and y^> denote the corresponding m-bit strings in x and y, respectively. 

10 We could just analyze classical protocols and use the special case of Razborov's result that applies to classical 
protocols. However, the classical version of Razborov's lemma was not known prior to [104] . and arguably would 
not have been discovered if it were not for the more general quantum version. We would end up with the same 
communication lower bound anyway, since quantum and classical weakly-unbounded error complexity turn out to be 
essentially the same |72l l6l] . 
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We will define inductively, for all j = 1,2,... , n/2m, particular strings x^> and yV> as follows. 
Let and Y$ denote n-bit strings where the first j blocks are set to . . . , x^' and y^ l \ ■ ■ ■ , y^\ 
respectively, and all the other blocks are set to 2m . In particular, X° and Y° are all zeros. We 
will define x^> and so that 

P(X j ,Y j ) > 2 j p or f3(X j ,Y j ) < -2 j (3 

depending on whether j is odd or even. This holds automatically for j = 0, which is the base case 
of the induction. 

Now assume x^\ . . . , x^" 1 ' and yW, . . . , y^~^ are already defined on previous steps. On the 
current step, we have to define x^> and yw, Without loss of generality assume that j is odd, 
thus we have (3(X 3 ~ , Y J_1 ) < — 2- ?_1 /3. Consider each i = 0,1,..., m/4. Run the protocol on 
the following distribution: x^) and j/w) are chosen randomly subject to each having Hamming 
weight m/4 and having intersection size i; the blocks with indices smaller than j are fixed (on 
previous steps), and the blocks with indices larger than j are set to zero. Let P(i) denote the 
expected value (over this distribution) of (3(x, y) as a function of i. Note that for i = we have 
P{i) = 1 ,Y J ) < — 2 J ~ 1 /3. On the other hand, for each i > the expectation is taken over 

x, y with f(x, y) = 1, because the rightmost intersecting point is in the j-th interval and hence odd 
(the even indices in the j-th interval have all been fixed to 0). Thus P(i) > f3 for those i. Now 
assume, by way of contradiction, that /3(Xi , Y^) < 2- J /3 for all x^',y^' and hence P(i) < 2- J /3 for 
all such i. By Lemma 14.71 f° r our choice of d, we can approximate P(i) to within additive error of 
(3/2 by a polynomial p of degree d. (Apply Lemma 14.71 to the protocol obtained from the original 
protocol by fixing all bits outside the j-th block.) Let r be the degree-d polynomial 

= p-£/2 

From the properties of P and the fact that p approximates P up to /3/2, we see that r(0) < — 1 
and r(i) E [0,2] for all i G [m/8]. But then by the degree lower bound of Theorem 14.81 

deg(r) > V(m/8)/4 = ^d 2 + 1/32 > d, 

which is a contradiction. Hence there exists an intersection size i G [m/8] where P(i) > 2 J /3. Thus 
there are particular iwl^u) with /3(X J , Y J ) > 2- J /3, concluding the induction step. 

For j = n/2m we obtain \/3(X j , Y j )\ > 2 n / 2m f3. But for every x,y we have \P{x,y)\ < 1/2 
because 1/2 + /3(x,y) = P{x,y) G [0, 1], hence 1/2 > 2 n / 2m /3. This implies 2mlog(l/2/3) > n, and 
therefore 

(q + log(l//3)) 3 > (q + log(l//3)) 2 log(l//3) = fi(mlog(l//3)) = fi(n) . 
This allows us to conclude PP(/) = ^(n 1 / 3 ), which is exponentially larger than UPP(/). 



5 Other applications 

In this section we go over a few further examples of the use of quantum techniques in non-quantum 
results, examples that do not fit in the two broad categories of the previous two sections. First 
we give two examples of classical results that were both inspired by earlier quantum proofs, but 
do not explicitly use quantum techniques 1^1 Finally, in Section HT31 we give a brief guide to further 
literature with more examples. 

llr This is reminiscent of a famous metaphor in Ludwig Wittgenstein's Tractatus logico-philosophicus (6.45), about 
the ladder one discards after having used it to climb to a higher level. 
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5.1 The relational adversary 

In Section 12.31 we described the polynomial method for proving lower bounds for quantum query 
complexity. This method has a strength which is also a weakness: it applies even to a stronger (and 
less physically meaningful) model of computation where we allow any linear transformation on the 
state space, not just unitary ones. As a result, it does not always provide the strongest possible 
lower bound for quantum query algorithms. 

Ambainis |14] 115] . building on an earlier method of Bennett et al. |27| . addressed this problem 
by providing a general method for quantum lower bounds, the quantum adversary, which exploits 
unitarity in a crucial way and which in certain cases yields a provably better bound than the poly- 
nomial method [15] . Surprisingly, Aaronson [3] was able to modify Ambainis's argument to obtain 
a new method, the relational adversary, for proving classical randomized query lower-bounds. He 
used this method to give improved lower bounds on the complexity of the "local search" problem, 
in which one is given a real-valued function F defined on the vertices of a graph G, and must locate 
a local minimum of F. In this section we state Ambainis's lower-bound criterion, outline its proof, 
and describe how Aaronson's method follows a similar outline@ 

Recall from Section 12.31 that a quantum query algorithm is a sequence of unitary operations 

U T O x U T ~iO x ---O x U l O x U , 

applied to the fixed starting state |0 . . . 0), where the basic "query transformation" O x depends on 
the input x and Uq, U\, . . . , Ut are arbitrary unitaries. Ambainis invites us to look simultaneously 
at the evolution of our quantum state under all possible choices of x; formally, we let \ip x ) denote the 
state at time t (i.e., after applying O x for the t-th time) under input x. In particular, = |0 . . . 0) 
for all x (and (V'SlV^) = 1 f° r each x,y). Now if the algorithm computes the Boolean function / 
with success probability 2/3 on every input, then the final measurement must accept any pair x € 
/ _1 (0),y G / _1 (1) with success probabilities differing by at least 1/3. It is not hard to verify that 
this implies \(tf)£ \ipy)\ < 17/18r 3 l This suggests that, for any given relation R C / _1 (0) x / -1 (1) 
(which one may regard as a bipartite graph), we consider the progress measure 

s t = £ 

(x,y)eR 

as a function of t. By our observations, initially we have So = \R\, and in the end we must have 
St < (17/18) • \R\. Also, crucially, the progress measure is unaffected by each application of a 
unitary Ut, since each Ut is independent of the input and unitary transformations preserve inner 
products. 

If we can determine an upper-bound A on the change \St+i — St\ in the progress measure at 
each step, we can conclude that the number T of queries is at least |i?|/18A. Ambainis [HI [15] 
provides a condition on R that allows us to derive such a bound: 

Theorem 5.1 (Ambainis) Suppose that the relation R satisfies 

12 In both cases we state a simplified ("unweighted") version of the lower-bound method in question, which conveys 
the essence of the technique. After Ambainis's original paper, a version of the lower bound was formulated that allows 
even negative weights [59]. Reichardt [108] recently proved that this general adversary bound in fact characterizes 
quantum query complexity. 

13 Contrapositively, and in greater generality, if | (ifii \if>2)\ > 1 — e then under any measurement, and \ip2) have 
acceptance probabilities differing by at most y2e (see [981 Section 9.2]). 
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(i) each x G / _1 (0) appearing in R appears at least mo times in R; 

(ii) each y G appearing in R appears at least m\ times in R; 

(Hi) for each x G / _1 (0) and i G [n], there are at most £q inputs y G / _1 (1) such that (x,y) G R 
and Xi / yi; 

(iv) for each y G and i G [n], there are at most l\ inputs x G / _1 (0) such that (x,y) G R 

and Xi^yi. 

Let the progress measure St be defined relative to a fixed quantum algorithm as above. Then for all 
t > 0, 

\S t+1 -S t \=0 \J^~ • l^l) , ^d therefore T = Q (j^ • ^ . 

The art in applying Ambainis's technique lies in choosing the relation R carefully to maximize 
this quantity. Intuitively, conditions (i)-(iv) imply that \St+i — St\ is small relative to \R\ by 
bounding the "distinguishing ability" of any query. 

Every classical bounded-error algorithm is also a quantum bounded-error query algorithm, so 
the above lower-bound also applies in the classical case. However, there are cases where this 
gives an inferior bound. For example, for the promise problem of inverting a permutation, the 
above technique yields a query bound of £l(y/n), which matches the true quantum complexity 
of the problem, while the classical randomized complexity is f2(n). In this and similar cases, 
the particular relation R used in applying Theorem 15.11 for a quantum lower bound, is such that 
max{mo/4, mi/h} gives a good estimate of the classical query complexity. This led Aaronson to 
prove in [3] a classical analogue of Ambainis's lower bound, in which the geometric mean of mo/fo 
and m\ji\ is indeed replaced with their maximum. 

A sketch of Aaronson's proof is as follows. Fixing the relation R, we use Yao's minimax 
principle to show that, if there is a randomized T-query bounded-error algorithm for computing 
/, then there is a deterministic algorithm A succeeding with high probability on a specific input 
distribution determined by R (to be precise, pick a uniformly random pair (x, y) G R and select x 
or y with equal probability). We now follow Ambainis, and consider for each input x and t < T 
the state vt tX (which is now a fixed, classical state) of the algorithm A after t steps on input x. Let 
It,x,y equal 1 if inputs x and y have not been distinguished by A after t steps, otherwise It,x,y = 0. 
Define St = Yl( x y)&R ^t,x,y as our progress measurer 4 ! 

Similarly to the quantum adversary, we have So = \R\, and Aaronson argues that the success 
condition of A implies St < (1 — 0(l))|i?|. A combinatorial argument then yields the following 
result bounding the maximum possible change \St+i — St\ after one (classical) query: 

Theorem 5.2 (Aaronson) Suppose that the relation R obeys conditions (i)-(iv) in Theorem \5.1\ 

Let the progress measure St be defined relative to a deterministic algorithm A. Then for all t > 0, 

\St+-\ — SA = O (mm < — — , —5-1 • \R\ ] and therefore T = Vt (max { — -, — — 

14 Actually, Aaronson [3] defines an increasing function, but we modify this to show the similarity with Ambainis's 
proof. We note that if the states vt, x are written as quantum amplitude vectors, and the state of the algorithm A 
records all the information it sees, It,x, y can actually be written as an inner product of quantum states just as in 
Ambainis's proof. 
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The details of Aaronson's proof are somewhat different from those of Theorem 15.11 and there 
is no explicit use of quantum states, but the spirit is clearly similar, illustrating that the quantum 
query model is a sufficiently natural generalization of the classical model for ideas to flow in both 
directions. Subsequently, Laplante and Magniez [79] gave a different treatment of this based on 
Kolmogorov complexity, which brings out the analogy between the quantum and classical adversary 
bounds even more closely. 

5.2 Proof systems for the shortest vector problem 

A lattice is an additive subgroup of M n consisting of all integer combinations of a linearly inde- 
pendent set of n vectors. It can be shown that for every lattice L, there exists a value A(L) > 0, 
the minimum (Euclidean) distance of the lattice, such that: (i) any two distinct x,y £ L are at 
distance at least A(L) from each other; (ii) there exists x £ L such that ||x|| = A(L). Lattice 
vectors of length A(L) are called "shortest (nonzero) vectors" for L, and the problem of computing 
the minimum lattice distance is also known as the "shortest vector problem" (SVP). 

The problem of approximating A(L) to within a multiplicative factor j(n) can be equivalently 
formulated as a promise problem called GapSVP 7 ( ra ), in which we are given a basis for a lattice L, 
with the "promise" that either A(L) < 1 or A(L) > j(n), and we must determine which case holds. 
A related problem is the "closest vector problem" (CVP), in which we are given a basis for a lattice 
L, and a "target" vector v ^ L, and would like to approximate the distance d(v, L) from v to the 
closest vector in the lattice. In the promise problem GapCVP 7 ( n ) , we are given L and v and must 
distinguish the case where d(v,L) < 1 from the case where d(v,L) > 7(n). It is known [53] that 
GapSVP 7 ( n ) reduces to GapCVP 7( - n ) for any approximation factor 7(71). 

Approximate solutions to closest and shortest vector problems have numerous applications 
in pure and applied mathematics. However, GapSVP 7 ( n ) has been proven intractable even for 
super-constant approximation factors 7(n), albeit under an assumption somewhat stronger than 
P 7^ NP (711 [57] (namely that NP-complete problems are not solvable by a randomized algorithm 
in time 2 polylog ( ri ).) Even computing an estimate with an exponential approximation guarantee in 
polynomial time is a highly nontrivial task, achieved by the celebrated LLL algorithm [82] ; the best 
current polynomial-time algorithm gives only slightly subexponential approximation factors [11]. A 
nearly exponential gap remains between the efficiently achievable approximation ratios and those for 
which we have complexity-theoretic evidence of hardness. Also, despite intense effort, no quantum 
algorithms have been found which improve significantly on their classical counterparts. 

A breakthrough work of Ajtai [8] initiated a sequence of papers [9] 11051 ED H07[ QUI 1102] which 
gave a strong motivation to better understand the approximability of various lattice problems. 
These papers build cryptosystems which, remarkably, possess strong average-case security, based 
only on the assumption that certain lattice problems are hard to approximate within polynomial 
factors in the worst caseEl For a fuller discussion of lattice-based cryptography, see [92], [93] . 

While these hardness assumptions remain plausible, another sequence of papers [77[ [20] [ST] 
[6] [7] has given evidence that the shortest vector problem is not NP-hard to approximate within 
polynomial factors (note that a problem may be intractable without being NP-hard). The most 

15 Intriguingly for us, Regev [106| gave one such cryptosystem based on a quantum hardness assumption, and 
recently Peikert [102] built upon ideas in [106] to give a new system based on hardness assumptions against classical 
algorithms. This is another example of a classical result based on an earlier quantum result. However, the connection 
is less tight than for the proof systems of Aharonov and Regev, since Peikert replaced the quantum aspect of Regev's 
earlier work by a fairly unrelated classical approach. 
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recent, due to Aharonov and Regev [7j, is a proof system for GapCVP Cv /^ for sufficiently large 
c > 0. That is, if we are given a basis for a lattice L with the promise that either d(v, L) < 1 or 
d(v, L) > Cy/n, there is a proof system to convince us which of the two cases holds. One direction 
is simple: if d(v,L) < 1, then exhibiting a lattice vector within distance 1 of v proves this (the 
promise is not needed in this case). The new contribution of [7J is to supply proofs for the case 
where d(v, L) > cy/n, showing that GapCVP Cv ^ is in NPflcoNP (technically, the promise-problem 
analogue of this class). It follows that GapCVP c ^ cannot be NP-hard unless the Polynomial 
Hierarchy collapses; see [7] for details of this implication. 

The proof system of [7] is, in Aharonov and Regev's own description, a "dequantization" of an 
earlier, quantum Merlin-Arthur (QMA) proof system by the same authors [6] for GapCVP Cv /^Ef| 
In a QMA protocol, a computationally unbounded but untrustworthy prover (Merlin) sends a 
quantum state to a polynomially-bounded quantum verifier (Arthur), who then decides to accept 
or reject without further communication; unlike NP, QMA allows a small probability of error. 
Although Aharonov and Regev analyze their second proof system in a completely classical way, the 
two papers offer an interesting case study of how quantum ideas can be adapted to the classical 
setting. We now describe some of the ideas involved; we begin with the quantum proof system 
of [6], then discuss the classical protocol of [7]. 

The quantum proof system In the QMA protocol, Prover wants to convince Verifier that 
d(v, L) > C\/n for some large c > 0. Central to the proof system is the following geometric 
idea: for our given lattice L, one defines a function F(x) : M. n — > [0, oo) that is lattice-periodic 
(i.e., F(x + y) = F(x) for all x £ W 1 and y £ L) and is heavily concentrated in balls of radius 
<C \fn around the lattice points. Now for each z £ M. n we consider the "^-shifted" version of F, 
F z (x) = F{x + z). The central idea is that if d(z,L) < 1, then (informally speaking) F z has large 
"overlap" with F, since the centers of mass shift only slightly. On the other hand, if d(z, V) > Cy/n, 
then F z and F have negligible overlap, since the masses of the two functions are then concentrated 
on disjoint sets of balls. In the proof system, Prover aims to convince Verifier that this overlap is 
indeed negligible when z = v is the target vector. To this end, Verifier asks to receive the "state" 

describing the pointwise behavior of F(x). (Here and throughout, we give a simplified, idealized 
description; the actual protocol uses points x of finite precision, over a bounded region that captures 
the behavior of the L-periodic function F. Thus all sums become finite, and states can be properly 
normalized.) We think of |£) as the "correct" proof which Verifier hopes to receive from Prover; 
note that |£) is independent of v. 

Verifier cannot hope to recover an arbitrary value from among the exponentially many values 
stored in However, given an elegant technique allows Verifier to estimate the overlap of F 
with F v , if the overlap (F, F v ) is defined (imprecisely) as 

(F,F V ) = Y, F(x)F(x + v). 

16 Actually, for the quantum protocol of [B], we need the slightly stronger promise that when the input satisfies 
d( v , L) > Cyfn, it also holds that A(L) > Cyfn. A proof system for this restricted problem still yields a proof system 
for SVP with approximation factor Cyfn. 
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Aharonov and Regev show that the overlap measure (F,F V ) they define is extremely close to 
e --Kd(v,L) /4 £ or an y v p rov ici ec i that A(L) > Cy/n, so that it is indeed a good indicator of distance 
from the lattice. 

To estimate this overlap, Verifier first appends a "control" qubit to initialized to (l/y/2) (|0) + 
|1)). Conditioned on the control qubit being 1, he applies the transformation T v which takes \y) to 
\y — v); this yields the state (l/y/2) (|0)|£) + \l)T v \^}). He then applies a Hadamard transformation 
to the control qubit, yielding 

l[mo+n\o)+\i)(\o-T v \®)}. 

Finally, he measures the control qubit, which equals 1 with probability 

- (mm) - = -a - ne(«ir«ie») • 

Consulting the definition of |£) and using that F is real-valued, we have Re((£|T„|£)) = (F,F V ), so 
the probability of measuring a 1 is linearly related to the overlap Verifier wants to estimate. 

The procedure above allows Verifier to estimate d(y, L), on the assumption that Prover supplies 
the correct quantum encoding of F. But Prover could send any quantum state so Verifier needs 
to test that \ip) behaves something like the desired state In particular, for randomly chosen 
vectors z within a ball of radius l/poly(n) around the origin, the overlap (F,F Z ), estimated by the 
above procedure, should be about e -71 "" 2 " / 4 . 

Consider h^{z) = Yie{{ip\T z \ip)) as a function of z. If this function could be arbitrary as we 
range over choices of \ip), then observing that h^{z) ~ h^(z) for short vectors z would give us 
no confidence that h^{v) ~ (F,F V ). However, Aharonov and Regev show that for every \tp), the 
function h^(z) obeys a powerful constraint called positive definiteness. They then prove that any 
positive definite function which is sufficiently "close on average" to the Gaussian e * 7r ll- 2r ll / 4 in a 
small ball around 0, cannot simultaneously be too close to zero at any point within a distance 1 
of L. Thus if d(v,L) < 1, any state Prover sends must either fail to give the expected behavior 
around the origin, or fail to suggest negligible overlap of F with F v . We have sketched the core 
ideas of the quantum proof system; the actual protocol requires multiple copies of |£) to be sent, 
with additional, delicate layers of verification. 

The classical proof system We now turn to see how some of the ideas of this proof system 
were adapted in [7] to give a classical, deterministic proof system for GapCVP Cv /^. The first new 
insight is that for any lattice L, there is an L-periodic function f(x) which behaves similarly to 
F{x) in the previous protocol, and which can be efficiently and accurately approximated at any 
point, using a polynomial-sized classical advice string. (The function fix) is defined as a sum of 
Gaussians, one centered around each point in L.) This advice is not efficiently constructible given 
a basis for L, but we may still hope to verify this advice if it is supplied by a powerful prover. 

These approximations are derived using ideas of Fourier analysis. First we introduce the dual 
lattice L* , which consists of all w £ M. n satisfying (w, y) G Z for all y £ L (here we use (•, •) to denote 
inner product). For example, we have (Z n )* = Z n . Note that for w G L* , r w (x) = cos(2tt(w,x)) 
is an L-periodic function. In fact, any sufficiently smooth L-periodic real-valued function, such as 
f(x), can be uniquely expressed as 

f( x )= ^2 f(w)cos(2ir(w,x)), 
weL* 
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where the numbers f(w) E R are called the Fourier coefficients of /. Our choice of / has particularly 
nicely-behaved Fourier coefficients: they are nonnegative and sum to 1, yielding a probability 
distribution (denoted by /). Thus 



/(x)=E ,[cos(2^,x>)]. 



It follows from standard sampling ideas that a large enough (N = poly(n)-sized) set of samples 
W = wi, . . . ,wn from / will, with high probability, allow us to accurately estimate f(x) at every 
point in a large, fine grid about the origin, by the approximation rule 



Since / is L-periodic and smooth, this yields good estimates everywhere. So, we let the proof string 
consist of a matrix W of column vectors w\, . . . , wn- Since Prover claims that the target vector v 
is quite far from the lattice, and / is concentrated around the lattice points, Verifier expects to see 
that fw( x ) is small, say, less than 1/2. 

As usual, Verifier must work to prevent Prover from sending a misleading proof. The obvious 
first check is that the vectors W{ sent are indeed in L*. For the next test, a useful fact (which relies 
on properties of the Gaussians used to define /) is that samples drawn from / are not too large 
and "tend not to point in any particular direction": E, w ^j-[(u, w) 2 ] < l/2n for any unit vector u. 
This motivates a second test to be performed on the samples w\, . . . ,wn: check that the largest 
eigenvalue of WW T is at most 3 TV. 

In a sense, this latter test checks that fw has the correct "shape" in a neighborhood of the 
origin, and plays a similar role to the random-sampling test from the quantum protocolEll Like its 
quantum counterpart, this new test is surprisingly powerful: if W is any collection of dual-lattice 
vectors satisfying the eigenvalue constraint, and d(v,L) < 1/100, then fw( v ) > 1/2 and Prover 
cannot use W to make v pass the test. On the other hand, if d(v, L) > Cy/n for large enough c, then 
choosing the columns of W according to / yields a witness that with high probability passes the two 
tests and satisfies fw(v) < 1/2. Scaling by a factor 100 gives a proof system for GapCVP 100cv /^. 

5.3 A guide to further literature 

In this section we give pointers to other examples where quantum techniques are used to obtain 
classical results in some way or other: 

• Data structure lower bounds: Using linear-algebraic techniques, Radhakrishnan et al. |103| 
proved lower bounds on the bit-length of data structures for the set membership problem with 
quantum decoding algorithms. Their bounds of course also apply to classical decoding al- 
gorithms, but are in fact stronger than the previous classical lower bounds of Buhrman et 
al. |31| . Sen and Venkatesh did the same for data structures for the predecessor problem [113] , 
proving a "round elimination" lemma in the context of quantum communication complexity 
that is stronger than the best known classical round elimination lemmas. A further strength- 
ening of their lemma was subsequently used by Chakrabarti and Regev [35] to obtain optimal 
lower bounds for the approximate nearest neighbour problem. 

17 Indeed, the testing ideas are similar, due to the fact that for any W, the function fw obeys the same "positive 
definiteness" property used to analyze the quantum protocol. 




i=l 
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Formula lower bounds: Recall that a formula is a binary tree whose internal nodes are AND 
and OR-gates, and each leaf is a Boolean input variable x% or its negation. The root of the 
tree computes a Boolean function of the inputs in the obvious way. Proving super polynomial 
formula lower bounds for specific explicit functions in NP is a long-standing open problem 
in complexity theory, the hope being that such a result would be a stepping-stone towards 
the superpolynomial circuit lower bounds that would separate P from NP (currently, not 
even superlinear bounds are known). The best proven formula-size lower bounds are nearly 
cubic [56], but a large class of quadratic lower bounds can be obtained from a quantum result: 
Laplante et al. [78] showed that the formula size of / is lower bounded by the square of the 
quantum adversary bound for / (mentioned in Section [5. ip . Since random functions, as well 
as many specific functions like Parity and Majority, have linear adversary bounds, one obtains 
many quadratic formula lower bounds this way. 

Circuit lower bounds: Kerenidis [69] describes an approach to prove lower bounds for clas- 
sical circuit depth using quantum multiparty communication complexity. Roughly speaking, 
the idea is to combine two classical parties into one quantum party, prove lower bounds in 
the resulting quantum model, and then translate these back to strong lower bounds for the 
classical model. (Unfortunately, it seems hard to prove good lower bounds for the resulting 
quantum model [8"T].) 

Horn's problem is to characterize the triples fx, u, and A of vectors of integers for which 
there exist Hermitian operators A and B such that /i, v, and A are the spectra of A, B, and 
A + B, respectively. It is known (and quite non-trivial) that this problem is equivalent to 
a question about the representation theory of the group GL(d) of invertible d x d complex 
matrices. Christandl |38] reproved a slightly weaker version of this equivalence based on 
quantum information theory. 

Secret-key distillation: In cryptography, Gisin, Renner, and Wolf |50] used an analogy 
with "quantum bound entanglement" to provide evidence against the conjecture that the 
"intrinsic information" in a random variable shared by Alice, Bob, and eavesdropper Eve 
always equals the amount of secret key that Alice and Bob can extract from this; later this 
conjecture was indeed disproved [110J, though without using quantum methods. 

Equivalence relations and invariants: A function / is said to be a complete invariant for 
the equivalence relation R if (x,y) G R 44> f(x) = f(y). It is an interesting question whether 
every polynomial-time computable equivalence relation has a polynomial-time computable 
complete invariant. Fortnow and Grochow \47\ Theorem 4.3] show this would imply that 
the class UP reduces to Simon's problem [118] (and hence would be in BQP, which seems 
unlikely) . 

Learning positive semidefinite matrices: Tsuda, Ratsch, and Warmuth [120] study 
online learning of positive semidefinite matrices using properties of von Neumann divergence 
(also known as quantum relative entropy) in their analysis in order to measure differences 
between density matrices. 

Johansson's theorem asymptotically equates the expected shape of the semi-standard 
tableau produced by a random word in k letters, with the spectrum of a certain random 
matrix. Kuperberg [74J provides a proof of this theorem by treating the random matrix as a 
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quantum random variable on the space of random words. In another paper, Kuperberg [75J 
also proved a central limit theorem for non-commutative random variables in a von Neumann 
algebra with a tracial state, based on quantum ideas. 



6 Conclusion 

In this paper we surveyed the growing list of applications of quantum computing techniques to 
non-quantum problems, in areas ranging from theoretical computer science to pure mathematics. 
These proofs build on current research in quantum computing, but do not depend on whether a 
large-scale quantum computer will ever be built. One could even go further, and use mathematical 
frameworks that go "beyond quantum" as a proof-tool. The use of PostBQP in Section 14.31 is in 
this vein, since we don't expect postselection to be physically implementable. 

We feel that "thinking quantumly" can be a source of insight and of charming, surprising proofs: 
both proofs of new results and simpler proofs of known results (of course, "simplicity" is in the 
eye of the beholder here). While the examples in this survey do not constitute a full-fledged proof 
method yet, our hope is that both the quantum toolbox and its range of applications will continue 
to grow. 
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A The most general quantum model 

The ingredients introduced in Section I2TT1 are all the quantum mechanics we need for the applications 
of this survey. However, a more general formalism exists, which we will explain here with a view 
to future applications — who knows what these may need! 

First we generalize pure states. In the classical world we often have uncertainty about the 
state of a system, which can be expressed by viewing the state as a random variable that has 
a certain probability distribution over the set of basis states. Similarly we can define a mixed 
quantum state as a probability distribution (or "mixture") over pure states. While pure states are 
written as vectors, it is most convenient to write mixed states as density matrices. A pure state 
\(j>) corresponds to the rank-one density matrix |^)(^|, which is the outer product of the vector 
with itself. A mixed state that is in (not necessarily orthogonal) pure states \<pi ),■■■, \(j>k) with 
probabilities Px,---,Pk, respectively, corresponds to the density matrix Yli=iPi\ c t ) i)( c t ) i\- The set of 
density matrices is exactly the set of positive semidefinite (PSD) matrices of trace 1. A mixed state 
is pure if, and only if, its density matrix has rank 1. 

The most general quantum operation on density matrices is a completely-positive, trace-preserving 
( CPTP) map. By definition, this is a linear map that sends density matrices to density matrices, 
even when tensored with the identity operator on another space. Alternatively, a map S : p i— >• S(p) 
from d x (i-matrices to d' x cf-matrices is a CPTP map if, and only if, it has a Kraus-representation: 
there are d' x d matrices M 1 ,...,M k , satisfying Yn=i M*Mi = I, such that S(p) = Y$=i MipM* 
for every d x d density matrix p. A unitary map U corresponds to k = 1 and M\ = U, so uni- 
taries act on mixed states by conjugation: p i— > UpU*. Note that a CPTP map can change the 
dimension of the state. For instance, the map that traces out ("throws away") the second register 
of a 2-register state is a CPTP map. Formally, this map is defined on tensor-product states as 
Tr2(^4 ® B) = A, and extended to all 2-register states by linearity. 

CPTP maps also include measurements as a special case. For instance, a projective measure- 
ment with projectors Pi, . . . ,P& that writes the classical outcome in a second register, corresponds 
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to a CPTP map S with Kraus operators Mj = Pi ® \i) . We now have 



k k 



PipP? 



MPiPP?) 



®Tr(P iP P*)\i){i 



i=i i=i 



This writes the classical value i in the second register with probability Tr(Pi P P*), and "collapses" 
p in the first register to its normalized projection in the subspace corresponding to Pi. 

While this framework of mixed states and CPTP maps looks more general than the earlier 
framework of pure states and unitaries, philosophically speaking it is not: every CPTP map can be 
implemented unitarily on a larger space. What this means is that for every CPTP map S, there 
exists a state po on an auxiliary space, and a unitary on the joint space, such that for every p, the 
state S(p) equals what one gets by tracing out the auxiliary register from the state U{p® po)U* . 
We refer to the book of Nielsen and Chuang [98, Section 8.2] for more details. 
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